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ABSTRACT 

We  consider  the  problem  of  minimizing  a differentiable  function  of  n parameters  with 
upper  and  lower  bounds  on  the  parameters.  The  motivation  for  this  work  comes  from  the 
optimization  of  the  design  of  transient  electrical  circuits.  In  such  optimizations  the  parameters 
are  circuit  elements,  the  bound  constraints  keep  these  parameters  physically  meaningful,  and 
both  the  function  and  the  gradient  evaluations  contain  errors.  We  describe  a quasi-Newton 
algorithm  for  such  problems.  This  algorithm  handles  the  box  constraints  directly  and  approxi- 
mates the  given  function  locally  by  nonsingular  quadratic  functions.  Numerical  tests  indicate 
that  the  algorithm  can  tolerate  the  errors,  if  the  errors  in  the  function  and  gradient  are  of  the 
same  relative  size. 


t This  paper  was  presented  at  the  SIAM  National  Meeting, 
June  16-18,  1976,  Chicago,  Illinois. 
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Introduction 

We  consider  the  following  optimization  problem:  Given  a real-valued  differentiable 
function  f of  n parameters  x=  (X|,...,x„),  find  an  xopt  such  that  f(xopt)  < f(x)  for  all  x in  the 
box  C=  {x  I aj<Xj<bj,  l<j<n}.  Lower  bounds  aj  = -ac,  and  upper  bounds  bj  = +x  are 
allowed.  We  will  describe  an  efficient  variable  metric  algorithm  designed  for  this  class  of 
problems  where  f and  its  gradient  may  be  evaluated  with  errors.  At  each  iterate  x,  we 
determine  a direction  of  movement  d.  Then  a scalar  a*  is  chosen  such  that  the  next  iterate  is 
x*=  x-fa*d  and  f(x*)<f(x).  The  change  in  x and  in  the  gradient  g are  used  to  update  an 
approximation  H to  the  inverse  of  the  Hessian  of  f.  Typically,  d is  the  corresponding  approxi- 
mation to  the  Newton  direction  on  some  subset  of  the  constraints  at  x. 


We  note  that  "box"  constraints  can  be  used  to  restrict  the  parameters  to  a region  where  f 
is  well-defined  or  where  the  parameters  are  physically  meaningful.  Box  constrained  problems 
arise  naturally  in  circuit  design  problems  which  served  as  the  primary  motivation  for  this  work 
(See  section  2). 

Members  of  the  one  parameter  variable  metric  family,  Broyden  |2|, 


(Hy)(Hy)T  ss^ 

H = H -f-—  + <f.a 
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s Hy  s Hy 

( — —)( )’ 

b a b a 
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have  been  used  to  generate  quasi-Newton  algorithms  of  the  above  type  for  minimization  with 
and  without  constraints.  Fletcher  |3|  and  Powell  14)  survey  much  of  the  relevant  literature. 
Gill  and  Murray  [5]  contains  additional  information  and  bibliographies 

In  formula  (1)  and  in  the  remainder  of  the  paper,  s denotes  the  change  in  parameters  x 


achieved  at  the  current  iteration,  y denotes  the  corresponding  change  in  gradient,  g denotes  the 
gradient  of  f at  the  current  iterate  x,  H denotes  the  approximation  to  the  inverse  of  the 
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Hessian  matrix  at  x and  G = M '.  In  (1),  a = y'Hy  and  b=y‘s.  The  superscript  * denotes 
updated  versions  of  x,  g,  s,  y,  H and  G.  Q will  be  used  to  denote  the  Hessian  of  f at  ;'.e 
iterate  x,  A>o  will  denote  a positive  definite  matrix  A. 


We  also  use  formula  (I).  We  want  to  choose  an  update  that  can  determine  the  true 
character  of  the  Hessian  matrices  (we  will  not  force  these  approximations  to  be  positive 
definite),  and  that  can  accept  arbitrary  directions  of  movement  without  requiring  projections  of 
the  Hessian  matrix  approximations.  If  the  Hessian  approximation  is  indefinite  on  some 
iteration,  we  may  want  to  search  along  a direction  that  is  not  necessarily  a quasi-Newton 
direction  on  some  subset  of  the  constraints.  Moreover,  if  no  information  is  lost  when  const- 
raints are  encountered,  we  can  use  the  approximate  Hessians  to  tell  us,  at  each  iteration,  what 
constraints  we  should  try  to  drop.  These  requirements  lead  us  to  the  update  in  ( I ) corre- 
sponding lo  <t>  = b/(b-a),  known  as  the  symmetric  rank  one  (SRI)  update, 

(Hy-s)(Hy-s)' 

H*  = H 1- . (2) 

y'(Hy-s) 

We  also  have  an  additional  requirement  that  the  procedure  v;  choose  should  not  be  too 
sensitive  to  reasonable  errors  in  the  function  and  gradient  evaluations  in  the  sense  that  it 
should  achieve  a minimum  to  within  the  specified  error.  Because  the  update  in  (2)  can  accept 
arbitrary  directions  of  movement  and  inaccurate  line  searches,  we  therefore  expect  it  to  be 
fairly  insensitive  to  reasonable  errors. 


To  avoid  getting  lost  in  detail  we  will  describe  only  the  basic  ideas  in  the  algorithm. 
Where  possible,  the  procedures  used  are  supported  by  relevant  Lemmas  and  Theorems. 
Otherwise  we  try  to  provide  the  heuristic  reasoning  behind  the  adoption  of  the  procedures. 

In  section  2 we  describe  the  particular  application,  electrical  circuit  design  problems. 


i 


which  motivated  the  development  of  this  algorithm.  Typically,  for  such  problems  each 
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function  and  gradient  evaluation  is  expensive,  costing  many  times  the  cost  of  the  auxiliary 
variable  metric  computations.  Furthermore,  these  values  are  computed  with  error. 

In  Section  3,  the  basic  algorithm  is  outlined.  In  Section  4,  we  further  motivate  our  choice 
of  the  SRI  update,  discuss  the  difficulties  in  updating  occasionally  encountered,  and  proce- 
dures for  dealing  with  such  difficulties  when  they  occur.  Some  of  this  material  is  more  fully 
developed  in  Cullum  and  Brayton  11].  The  updating  tests  used  are  directly  related  to  questions 
of  dependency  of  the  directions  of  search  generated  and  of  the  singularity  of  the  approximate 
Hessian  matrices  generated.  We  do  not  require  the  approximating  matrices  to  stay  positive 
definite,  only  nonsingular. 

In  section  5,  we  describe  the  procedure  used  for  determining  the  direction  of  movement  at 
each  iterate.  A quadratic  program  whose  size  equals  the  number  of  active  constraints  is  solved 
at  each  inleration.  If  this  fails  for  some  reason,  a full-sized  OP  is  solved.  This  differs  from 
other  procedures,  (see  Mangasarian  [6]  and  Fletcher  (7|),  which  solve  a full-sized  quadratic 
program  at  each  iteration.  In  Section  6,  we  describe  the  scaling  heuristics  used.  The  scales 
obtained  are  used  to  generate  minimal  step  requirements  in  each  line  search,  to  obtain  an 
initial  guess  on  the  Hessian  approximation,  and  in  determining  when  a constraint  is  on  a 
boundary. 

In  Section  7,  we  describe  briefly  the  line  search  procedures  used.  Bending,  due  to 
McCormick  (S),  is  used  to  avoid  zigzagging.  Searches  are  made  along  lines  not  rays.  Section 
8 contains  remarks  about  the  use  of  non-quasi-Newton  directions  and  the  convergence  test. 

In  Section  9,  the  results  of  several  numerical  tests  are  described.  Two  types  of  tests  were 
made.  One  to  demonstrate  results  for  box  constraints,  and  one  to  demonstrate  the  behavior  of 
the  algorithm  when  there  are  errors  in  the  function  and  gradient  evaluations. 


Extensions  to  more  general  constraints  are  not  discussed. 
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2.  Circuit  Design  Applications 

The  following  simple  example  illustrates  the  use  of  optimization  in  time  domain,  circuit 
design  problems. 

Example 

'''  Vg(t) 

j -i 


Given  an  input  voltage  Eft)  determine  appropriate  values  of  the  resistances  X|  and  x->  to  make 
the  output  voltage  V„(t)  match  a specified  desired  voltage  Vft)  as  closely  as  possible  in  some 
norm.  For  example,  minimize,  f(x)=  | V„  - V || where  II  • II  t is  the  l.i  - norm.  (and  thus 
f(x))  is  evaluated  by  solving  a system  of  ordinary  differential  equations  that  describe  the 
voltages  and  currents  in  the  circuit.  In  addition,  the  resistances  are  constrained  in  size, 
aj<Xj<bj,  i=l,2.  For  example  Xj  > o is  a restriction  that  if  violated  would  produce  a circuit 
that  would  not  make  sense  physically. 

In  general  in  a transient  circuit  design  problem,  the  function  to  be  minimized  has  the 

form 

f(x)  = I h(u(x,t),x,t)dt  (.4) 

o 
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with  the  constraint  set  C = |x|a<x<b|  . u(x,t)  is  the  solution  of  a system  of  ordinary  differen- 
tial equations  which  depends  on  the  parameters  x: 


u = W(u.x.t)  It  1»),  r|.  (4) 

Using  variational  arguments  (see  for  example,  Uestenes  19|),  we  can  show  that  the  gradient  of 
such  a function,  g(x),  is  expressible  as  the  integral  of  the  Lagrange  multipliers  A(t)  associated 
with  the  differential  equation  constraints  (4).  For  example, 


and  denotes  partial  derivative.  This  method  of  obtaining  the  gradient  is  called  the  adjoint 
method.  A 3 parameter  problem  described  in  Brayton,  Uachtel,  and  Gustavson  |l()|  has,  ISO 
nodes,  460  branches,  1 12  nonlinear  elements  and  192  differential  equations. 


Analytical  expressions  for  f and  g are  not  available.  Both  are  computed  with  truncation 
error,  due  to  the  finite  step  size  used  in  solving  the  differential  equations,  and  each  evaluation 
is  expensive.  We  note,  however,  that  once  the  function  is  evaluated  at  some  x,  then  the 
corresponding  gradient  evaluation  costs  approximately  another  function  evaluation.  The 
truncation  error  varies  from  iteration  to  iteration  and  is  controlled  roughly  by  parameters  set 
by  the  user.  The  greater  the  accuracy  required,  the  longer  the  lime  requireil  to  evaluate  f 
Thus,  the  philosophy  of  optimization  is  different  here  than  in  the  typical  minimization 
literature.  We  do  not  want  and  in  fact  cannot  compute  arbitrarily  accurate  minima;  we  want  a 
"good  enough"  approximation.  This  error  situation  is  illustrated  in  l•igure  I where  the  error  in 
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the  function  evaluation  of  a unimodal  function  of  I variable  is  shown.  A similar  graph  could 
he  drawn  for  the  gradient  of  f.  We  know  f only  to  within  the  tube  Any  line  search  to  find 
the  minimum  of  f in  f'igure  I should  stop  within  the  indicated  stopping  region,  because  to 
within  the  specified  error,  all  such  x are  equally  good  minima  of  f.  If  a better  approximation 
to  the  minimum  is  required,  the  user  supplied  error  control  parameters  must  be  tightened,  but 
then  each  function  and  gradient  evaluation  becomes  more  expensive 

The  truncation  error  affects  the  line  search,  the  updating,  and  the  convergence  tests. 
Basically,  we  need  a forgiving  algorithm,  one  where  the  steps  do  not  have  to  be  done  exactly. 

It  had  been  proposed  to  one  of  the  authors  that  the  proper  approach  in  sueii  a situation  is 
to  consider  the  output  of  our  function  evaluator  as  an  approximation  to  f,  call  it  f*',  and  then 
to  minimize  f*'  exactly.  This  makes  no  sense  in  this  setting,  since  f‘‘  = f+C|,  ami  the  gradient  g-' 
= g+Ci,  where  e^  is  not  necessarily  related  to  Cj,  and  neither  e,  nor  Ct  is  a well-behaved 
function.  We  have  to  admit  that  we  do  not  know  f and  g precisely  and  must  handle  the 
computations  accordingly. 

The  minimization  procedure  was  therefore  designed  to  be  flexible  and  user-controlled. 
Probably,  it  is  best  to  u.se  it  interactively.  Users  set  parameters  determined  by  their  knowdedge 
of  the  accuracy  of  f and  g.  These  parameters  can  be  altered  to  check  or  improve  the  accuracy 
of  the  solution  obtained. 

In  Section  9 numerical  experiments  of  the  following  type  are  discussed.  We  modify  the 
program  that  evaluate  f and  g so  that  f^  = f-t-r,  and  g,=  g-Hfi  ^■'c  returned  to  the  optimizer 
where  f|  and  are  random  variables  with  |f||  < (a-|-r|f|)  and  similarly  for  | | . a 

represents  an  absolute  error  in  f and  r a relative  error. 


Fafic  S 


,V  OiiUmc  of  Aluorilliiii. 

I'he  alt!orilhni  is  a quasi-Ncwioii  mcihoil  and  assumes  that  f can  he  adequately  approxi- 
mated near  its  minimum  by  a nonsingular  quadratic  As  with  all  quasi-Newton  methods, 
stationary  points  are  ct)mpuled.  1 here  is  no  euaranlee  that  these  are  local  minimizinj;  points, 

Davidon  |l  l|  uses  the  family  of  updates  in  (I).  .Many  of  the  ideas  in  this  algorithm  could 
be  used  in  any  extension  of  Davidon  | I 1 1 to  hanille  eonstraints  and  errors.  Although  the  SRI 
update  formula  has  limitations,  these  can  be  overcome  and  this  will  be  discussed  in  Section  4. 
We  only  note  here  that  the  tests  used  to  avoid  these  limitations  are  connected  with  maintaining 
a)  independence  of  the  directions  searched  and  b)  nonsingularity  of  the  matrices  generated, 
a)  and  b)  are  important  in  any  variable  metric  algorithm. 

The  algorithm  was  programmed  to  store  and  update  both  the  Hessian  (CJ)  and  inverse 
Hessian  (H)  approximations;  however,  it  can  be  easily  reprogrammed  to  use  only  Ci,  and  this  is 
demonstrated  in  later  sections  as  each  part  of  the  algorithm  is  described  in  detail. 

A.  Determination  of  "Best"  Feasible  Direction. 


At  each  iterate  x we  have  an  approximate  Hessian  G,  and  we  use  the  quadratic  approxi- 
mation to  changes  in  f at  x. 


,if(x)  = q(Ax)  = 


Ax'  GAx 


-I-  g'  Ax 


(b) 


to  determine  a direction  of  movement  at  x.  If  no  constraints  tire  active  at  x,  we  search  along 
the  quasi-Newton  line  z = x-l-  n Hg  through  x.  If  one  or  more  constraints  are  active  at  x,  we 
solve  the  quadratic  program,  minimize  q(Ax)  over  all  feasible  Ax,  to  obtain  the  next  direction 
of  search.  We  reduce  this  to  a quadratic  program  (OF)  of  size  equal  to  the  number  of  active 


constraints  (see  Section  5).  If  w'c  obtain  a solution  to  this  (,)F  which  is  a ilirection  of  descent. 
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and  that  satisfies  other  eriteria,  we  search  along  it.  If  not,  we  revert  to  a full  size  OP-  Note 
that  in  using  a OP  we  can  generate  a direction  of  movement  d = Ax  that  may  correspond  to 
dropping  no  constraints,  some,  or  all  of  them.  Also  note  that  if  G is  indefinite,  such  a 
direction  is  not  necessarily  a quasi-Newton  direction  on  some  subset  of  the  contraints. 

B.  Line  Search. 

We  then  compute  x*  = x + a*d  where  a*  is  chosen  to  roughly  minimize  f on  this  line.  We 
must  search  along  lines  (instead  of  rays)  since  G need  not  be  positive  definite.  Bending,  (see 
McCormick  (8|,)  is  allowed  in  the  line  search  to  avoid  zigzagging  which  can  occur  if  we 
bounce  on  and  off  of  constraints  without  achieving  minimization.  We  require  that  ||a*d|  be 
greater  than  some  minimum  step  because  of  error  considerations  (see  D and  E). 

C.  Updating. 


The  resulting  overall  changes  in  x,  denoted  by  s,  and  the  changes  in  gradient  g , denoted 
by  y,  are  used  to  update  G (and  H)  using  the  SRI  update  formulas: 


G*  = G + 
H*  = M - 


(y-Gs)(y-Gs)' 

s^(y-Gs) 

(Hy-s)(Hy-s)T 
y ' (Hy-s) 


(2') 


Updating  requires  that  the  denominators  in  (2'),  s’(y-Gs)  and  y’(Uy-s)  do  not  vanish.  In 
Section  4.  we  discuss  the  updating  tests.  If  any  of  these  tests  fail,  the  algorithm  uses  a 
direction  of  movement  other  than  a quasi-Newton  line  on  some  subset  of  the  constraints.  The 
non  qua.si-Newton  directions  discussed  theoretically  (see  rheorems  .4  and  4),  are  eigenvectors 
of  projections  of  G and  the  coordinate  directions.  However,  for  simplicity,  in  all  the  numerical 
tests  run,  only  coordinate  directions  were  usetl.  Test  failures  occurreil  infrequently  and  were 


easily  handled  using  coordinate  lines. 


D.  Scaling. 


In  circuit  design  applications  experience  ‘las  shown  that  sonic  advantages  can  be  obtained 
by  scaling  the  starting  Hessian  approximation  Ci^,  (and  H,,).  A heuristic  based  on  the  u.ser 
supplied  upper  and  lower  bounds  and  the  initial  values  ol  the  parameters  is  used.  .See  Section 
6 for  details. 

E.  Error  Control. 

The  user  specified  tolerances  are  combined  with  the  scalings  generated  to  obtain  toleranc- 
es used  in  (a)  determining  when  a parameter  is  on  a boundary  and  (b)  generating  minimal 
steps  used  in  the  line  searches.  These  tolerances  can  be  set  to  yield  a crude  approximation  to 
the  optimum  which  can  then  be  refined  by  reducing  the  tolerances. 

F.  Convergence  Test. 

Since  errors  are  present,  the  normal  convergence  tests  requiring  a small  quasi-Newton 
direction  are  not  appropriate.  Therefore,  a successive  search  of  the  n coordinate  lines  through 
a given  iterate  without  achieving  descent  terminates  the  procedure.  These  searches  use  a 
minimal  step  size  generated  from  information  provided  by  the  user.  We  note  that  with  box 
constraints,  at  the  optimum,  the  Lagrange  multiplier  for  the  j*''  parameter  is  simply  the  j''' 
component  of  the  gradient  with  the  appropriate  sign.  Therefore,  a simple  check  of  the 
gradients  of  the  active  constraints  is  sufficient  to  test  for  stationarity.  The  procedure  also 
terminates  when  a consistency  test  on  the  function  and  gradient  evauations  is  violated  more 


than  a specified  number  of  times. 
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4.  Symmetric  Rank  Update. 

As  slated  earlier  the  symmetric  rank  I update  was  used  because  it  satifies  the  ret|uire- 
ments  listed  in  the  introduction.  Theorem  1,  see  for  example  l iacco  and  McCormick  112),  is  a 
statement  of  most  of  this  fact. 

Theorem  Let  f=x’Ox/2  + d'x  be  a quadratic  function.  Let  Sj,  l<j<n  be  arbitrary 
directions.  For  any  initial  x“,  and  successive  movement  along  the  Sj  with  updating  on  each 
step  using  (2),  will  yield  OH„,g=g  for  some  m<n,  if  each  update  is  defined. 

This  theorem  can  be  relaxed  to  include  k>n  steps,  with  U*  = U whenever  the  update  is  not 
defined.  Murtagh  and  Sargent  |I31,  (I4|  and  Powell  1I5|  have  used  the  SRI  update  in 
algorithms  with  mixed  results.  In  Davidon  (ll|  an  optimally  conditioned  update  is  chosen  on 
each  iteration;  the  SRI  update  is  the  optimally  conditioned  update  in  some  situations.  Powell 
[16|  in  analyzing  the  Davidon  |1I|  work  has  used  an  update  formula  that  is  a modification  of 
the  SRI  update  to  preserve  desirable  properties  of  the  Hessian  approximations  such  as  positive 
definiteness.  The  SRI  update  need  not  stay  positive  definite,  and  it  is  susceptible  to  ill- 
conditioning;  however.  Theorem  1 makes  it  attractive  for  constrained  problems,  wluic 
successive  directions  of  movement  can  be  quite  general. 

The  other  consideration  which  lead  us  to  the  SRI  update  is  the  following.  There  are  very 
basic  differences  between  minimization  problems  with  parameter  constraints  and  problems 
without  constraints.  For  unconstrained  problems  the  assumption,  that  th'  runction  being 
minimized  is  locally  convex,  is  necessary  for  the  existence  of  a minimizer;  thus  approximation 
by  a positive  semidefinitc,  quadratic  function  is  plausible.  However,  when  constraints  are 
present,  local  convexity  is  not  necessary  for  existence,  as  the  simple  example  f(x)  = 
-X'  + 4,l<x<2,  demonstrates.  We  note,  however,  that  it  is  necessary  for  the  problem  to  be 
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locally  convex  in  those  variables  that  are  interior  to  the  constraints  at  the  inmiini/er  of  f. 
Therefore,  since  we  intend  to  keep  a full  Hessian  approximation  rather  than  a projected 
Hessian  approximation  as  is  done  in  Goldfarb  |I7|,  we  should  be  able  to  approximate  jteneral 
quadratic  functions  not  just  positive  definite  ones.  With  a full  approximate  Hessian  we  can 
utilize  the  directions  of  negative  curvature  to  improve  convergence.  The  theory  associated 
with  the  SRI  update  is  not  restricted  to  the  case  where  f is  a positive  definite  quadratic 
function.  Therefore,  this  update  can  pick  up  the  true  character  of  the  Hessian.  In  regions  of 
negative  curvature,  the  algorithm  described  here  can  generate  negative  curvature  directions  of 
search,  but  does  not  choose  such  directions  in  an  optimal  way.  A drawback,  of  course,  is  that 
the  SRI  Hessian  approximation  can  become  indefinite  even  when  f is  a positive  definite 
quadratic  function,  so  the  desire  to  approximate  general  functions  is  only  imperfectly  satisfied 
by  use  of  this  update.  We  would  really  like  to  have  an  update  formula  that  could  accept 
general  directions  of  movement  and  that  would  simulate  the  Hessian  accurately.  The  Davidon 
(III  and  Powell  ( I6(  analysis  may  yield  such  a formula  or  procedure. 

In  line  with  the  above  comments,  and  in  contrast  to  most  existing  algorithms  (Murtagh 
and  Sargent  (131  and  Gill  and  Murray  |18l)  we  do  not  try  to  keep  H*  positive  definite,  we 
worry  only  about  possible  dependence  of  the  directions  of  search  generated  and  singularity  of 
the  approximating  matrices  H*  or  G*. 

Now  consider  the  SRI  update.  The  denominator  in  (2')  can  vanish  if  (7.1)  Hy=s  or 

(7.2)  Hyi^s  but  y'(Hy-s)  = o.  The  following  theorems  tell  us  the  significance  of  (7.1)  and 

(7.2) ,  and  how  to  circumvent  these  difficulties. 

Theorem  T Consider  any  variable  metric  algorithm  for  which  a)  at  each  step  H*  is  given 
by  ( 1 ) for  some  choice  of  <(),  b)  the  direction  of  search  d at  each  stage  is  the  quasi-Newton 
direction  on  some  subset  of  the  box  constraints,  and  (c)  x*  = x+(>*d  where  <>*  is  chosen  by 


some  rule.  Then; 
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(a)  If  f is  any  function,  at  some  step  Hy=s,  H is  nonsingular,  and  the  active  constraint  set  did 
not  change  from  x to  x*,  then  d*  is  a multiple  of  the  old  direction  d. 

(b)  If  f is  quadratic,  Hy=s  at  some  step,  and  H is  nonsingular,  then  s is  the  true  Newton 


direction  at  x on  the  current  set  of  constraints. 


Note  that  this  result  holds  for  any  variable  metric  algorithm  generated  using  formulas 
from  (1).  This  theorem,  as  well  as  Theorems  3 through  6,  was  proved  in  Cullum  and  Brayton 
[1)  for  the  unconstrained  case. 

Proof  of  Theorem  ^ 

(a)  Assume  Hy=s  and  H nonsingular.  Let  E be  the  columns  of  the  identity  matrix 
corresponding  to  the  inactive  constraints  at  x.  Thus,  E^GE  is  the  projection  of  G onto  the 
space  spanned  by  the  E.  If  we  set 

^ Ax^GAx 

Af  = + g^Ax, 

2 

then  the  quasi-Newton  move  s from  x to  x*  is  s=a  Ev  where 

v=(ETGE)+ E^g  . (S) 

In  (8)  t denotes  the  pseudoinverse  of  E^GE.  Similarly,  the  quasi-Newton  direction  from  x*.  if 
the  constraints  do  not  change,  is  d*  = E(E rG*E)'*'E^g*  Since  Hy=s,  then  g*=g+Gs  and  G*=G. 
Therefore,  E^g*  = E*^g  + oE^GEv  and  from  (8), 

E^GEv*  = pTg*  = (l+n)E'GEv. 

Hence,  v*  differs  from  v by  some  vector  in  the  null  space  of  E’GE.  But  v*  and  v are  obtained 
from  the  pseudoinverse,  which  gives  vectors  whose  projections  on  this  null  space  is  zero. 


(b)  If  f is  quadratic,  then  the  true  Newton  direction  on  the  current  constraints  at  x is  the 


Page  14 


r 


vector  F.w  where  w is  the  vector  of  minimum  norm  that  saiishcs  the  equation  w = - I'.’g. 

Since  Hy=s=al’.v,  tlien  Qf-v  = Ciliv  and  (ll’Uliiv  = (E-  'til  )v  = - I.'g,  Iherefore,  v = w. 

OE.D. 

Thus,  condition  (7,1)  indicates  dependency  problems  for  the  directions  of  search  being 
generated,  if  f is  not  essentially  quadratic  in  the  region  near  x. 

Now  consider  condition  (7.2).  We  have  the  following  theorem,  whose  proof  is  independ- 

j 

ent  of  the  presence  of  constraints  and  is  given  in  Cullum  and  Brayton  [ 1 J.  It  only  applies  to 

i 

i the  SRI  update  and  connects  (7.2)  to  the  singularity  of  the  update  Ci*. 

Theorem  T For  any  f,  if  at  some  step  of  an  SRI  procedure  Hy#s,  but  y'(Hy-s)=(),  then 
the  SRI  updated  Hessian  approximation  G*,  given  by  (2').  is  singular  and  Hy-s  is  a zero 
eigenvector  of  G*.  Conversely,  if  G is  nonsingular  and  G*z=0,  then  /.  = g(Hy-s).  A similar 
result  holds  for  H*  when  s'(y-Gs)  = o but  y-Gs#o. 

Theorem  4 tells  us  how  to  correct  problem  (7.1),  and  Theorem  5 tells  us  how  to  handle 
problem  (7.2).  Each  is  an  extension  of  results  in  Cullum  and  Brayton  ( 1 1 to  include  const- 
raints. 

Theorem  A Let  f be  any  twice  continuously  differentiable  function.  Let  O be  the  Hessian 
of  f at  X*.  Let  W|,,  l<k<m,  be  orthonormalized  eigenvectors  of  E'GE  where  E is  the  nxm 
j matrix  whose  columns  are  + or  - the  columns  of  the  identity  matrix  that  correspond  to  the 

^ inactive  constraints  at  x*.  Then  if  the  current  G does  not  generate  the  true  Newton  direction 

I 

on  the  set  of  constraints  at  x*,  there  exists  a w^  such  that  (g*)'  F.W|^5^o,  (i.e,  E-.Wi^  is  a feasible 
descent  line)  and  (HO-l)EW|^ipto. 

Proof  of  Theorem  4. 


Let 
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W|=  |W|j  I E ^GEw|^  = E • OEW|^  ( and 
W2=  I W|.  not  in  W , j. 

If  for  all  Wj,  e Wt  , W|^ ' (li  ’ g*  ) = (),  then 

E^^g*=  W|V 

where  the  columns  of  W|  are  just  the  vectors  in  W|,  Note  that  each  vector  in  W,  is  an 
eigenvector  of  E'OE,  and  therefore,  is  also  an  eigenvector  of  (E^^QEI^  Therefore,  the  true 
Newton  direction  Ez*  satisfies 

z*  = - (ETQE)t\V,v=  - W|A\ 

where  A ^ denotes  the  diagonal  matrix  of  the  reciprocals  of  the  nonzero  eigenvalues  of  E'GE 
corresponding  to  W,  with  the  reciprocal  of  any  zero  eigenvalue  set  to  zero.  Similarly,  the 
quasi-Newton  direction  Ez  satisfies 

z=  -(ETGE)'^W|V  = - W,At  V 

Therefore,  z*  = z,  contradicting  the  assumption  that  the  current  direction  is  not  the  true 
Newton  direction.  Thus,  there  must  be  at  least  one  Wj,e  W2  such  that  W|(’E'g*#o.  But,  Wj^e 
W2  implies  that  E^GEwjj^E^OEW|^  and  since  G is  nonsingular,  (HO-l)EW|. #0. 

o.E.n 

Thus,  theorem  4 says  if  at  some  point  x*  we  have  Hy  = s,  then  either  a)  by  considering  the 
eigenvectors  of  E^GE  we  can  obtain  at  x*  a direction  of  descent  d*  = EW|^  for  some  k and  for 
small  steps  a,  Hy*?4s*,  or  el.se  b)  -Hg*  is  the  true  Newton  direction  at  x*  on  the  constraint  set. 
Throughout  the  discussions,  if  E'GE  is  not  of  full  rank,  we  call  the  direction  obtained  using 
the  pseudoinverse  (E^GE)^  the  quasi-Newton  direction.  This  yields  the  direction  of  minimal 
norm  satisfying  the  quasi-Newton  stationarity  conditions.  We  should  note  that  since  I- 
contains  only  columns  of  the  identity  it  is  trivial  to  compute  E'CiE. 

Theorem  y Let  f=  x'Ox/2  + c^x+e,  where  O is  nonsingular.  Let  Hy^s,  but 
y^(Hy-s)=o  at  some  step  in  an  SRI  procedure.  Then  at  x there  exists  a feasible  direction 
d = s-»-/}ej  for  some  coordinate  line  ej  and  some  /!#()  such  that  for  some  /i#o,  x**  = x + /i(s  + /k'|) 


I’agf  16 


is  feasible,  f(x**)<f(x),  and 


(y»»)T  ([,y.,.S*«)^0  (g, 

where  y**  = g**-g,  s**  = x**-x. 

Note;  For  any  quadratic,  (9)  is  independent  of  fi.  We  write  Theorem  ,S  as  above  because  we 
must  apply  it  to  non-quadratic  functions. 

Proof  of  Theorem  ^ Define  for  any  ej  and  fi.  h{li)  = a^ji-  + hji  where  a-=  (Qej)' 
(HO-DCj  and  bj=  (Oej)  * ( HQ-Ds.  Then  (9)  is  equivalent  to  h(li)^o.  Since  the  ajj,  l<j<n, 
are  the  j'*’  diagonal  entries  of  the  indefinite  matrix  O(HQ-I),  it  is  possible  that  all  Ujj=o. 
However,  bj  is  the  projection  of  the  non-zero  vector  (HQ-Ds  onto  QCj  and  O is  nonsingular; 
hence  there  must  be  some  ej  such  that  bj^to. 

Define  U|=  icjl  bj^to  or  As  in  Theorem  4,  we  let  E correspond  to  the  inactive 

constraints  at  x*.  Note  that  if  ejeU|,  then  h(/3)  can  vanish  for  at  most  one  nonzero  value  of 
p.  Moreover,  if  h(P)=o  then  h(/?/2)?^o.  There  are  2 cases  to  consider. 

First  assume  that  there  exists  an  CjeU,  such  that  Cj'E  E'g*?to.  That  is,  e^  yields  a 
leasible  descent  line  at  x*.  In  this  case  we  simply  search  along  y=  x*  + PCj  for  a minimum  or 
until  we  reach  an  additional  constraint.  This  defines  p*,  and  if  h(/i*)=o,  then  p*  is  replaced 
by  P*/2.  Thus,  x**=  x+s+p*Cj  satisfies  the  requirements  of  the  theorem. 

In  the  second  case  no  such  Cj  exists.  We  show  that  any  ej  from  U,  is  acceptable.  Again 
there  are  2 possibilities.  First  suppose  there  is  an  CjeU,  such  that  Eej^i).  Then  we  may  use 
this  Cj  with  p free,  but  its  sign  chosen  so  that  /fe^'g  < o.  .Since  s is  a descent  direction  at  x, 
s-t-/kj  will  be  also.  Then  we  search  for  g so  that  f is  minimized  in  the  direction  s+pc^  or  a 
new  constraint  is  encountered.  The  new  point  is  x**=  x+fps  + pc^).  If  there  is  no  ejeU,  such 
that  FXj74o,  then  all  CjeU,  correspond  to  active  constraints.  Pick  any  e|cU|  and  choose  the 
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sign  of  fi  such  that,  x+/fCj  is  feasible  at  x for  all  [i  of  this  sign.  For  example  suppose  /f>o. 
y = x+/fcj  may  or  may  . not  be  a descent  ray,  but  for  fi  small  enough  y = x+s+/fCj  is  a descent 
ray.  For  example,  choose 


The  new  point  is  x**  = 
Q.F.D. 

Observe  that  the  form  of  s is  not  used  in  the  proofs  of  Theorem  3,  4,  or  5,  so  the  results 
are  valid  for  any  s,  and  even  when  we  allow  bending.  In  Theorem  2,  s is  required  to  be  a 
quasi-Newton  direction  on  some  subset  of  the  constraints.  Theorems  2 and  3 generate  the 
tests  used  on  each  iteration  of  our  algorithm  before  updating.  Theorems  4 and  5 tell  us  what 
to  do  when  one  of  these  tests  fails.  The  tests  are  as  follows. 

Test  for  dependence  of  directions  of  search.  We  check  whether 

I I Fly-s  I I 


and 

I I y-Gs  I I 

> f 

I lyl  I 

For  a quadratic,  (10)  can  be  written  as 

I I (HO-l)s  I I 

> f 

I Is|  I 

I 

. J 


(10.1) 


(10.2) 


(III) 


Now  we  minimize  along  this  ray  or  move  until  a new  constraint  is  met. 
x+/i(s+/k-j). 


I*age  IS 


and 


I I (l-GO  ')y  I 1 


> f. 


(1 1.2) 


so  using  (10)  we  are  checking  whelhcr  s is  a zero  cigenvecior  of  (IIU-I)  and  whelher  y is  a 
zero  eigenvector  ot  the  counterpart  matrix  (1-G(J  ‘). 


Te^  for  Singularity.  Using  1 heoreni  3 we  check  the  following,  rather  than  the  denomina- 
tor of  the  SR  1 update. 


I iG*(Uy-s)|  I 
I I Hy-s|  I 


(12.1 ) 


and 


I I H»(y-Gs)  I I 
I ly-Gs|  1 


(12.2) 


We  note  that  we  do  not  have  to  compute  G*  or  11*  to  compute  these  quantities.  In  fact 
condition  (12.1)  equals 

ly‘(Hy-s)l  I |y-Gs|| 
lllly-sll  ls'(y-Gs)| 

and  condition  (12.2)  is  just  the  reciprocal  of  (13).  By  checking  both  Ci*  and  II*  for  singulari- 
ty, we  are  (at  least  heurtsiically ),  controlling  the  condition  of  each  of  these  matrices  by 
controlling  the  largest  and  the  smallest  eigenvalues  of  each. 

In  Cullum  and  Hrayton  1 1 |,  we  give  an  example  using  the  tests  in  (12).  I hese  tests  are 
violated  only  inlreqiiently.  In  fact  in  the  algorithm  as  programmed,  eigenvectors  of  II  are  not 


computed,  only  coordinates  lines  arc  userl.  In  each  case  that  either  test  (10)  or  (12)  has 
failed,  the  simple  proceilure  of  taking  an  Cj  that  is  a direction  of  descent  at  x*  has  given  a good 
update.  Typically,  only  1 extra  line  search  was  needed  to  complete  the  upd.iie.  I'heoretically. 
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the  proeedures  proposed  eould  be  eostly,  involving  several  line  searehes.  However,  in  praetiee 
the  cost  seems  nominal. 

If  tests  (10)  and  (12)  are  both  pas.sed,  then  the  G and  H matrices  are  updated.  Before 
proceeding  we  note  that  only  G is  reejuirc'd  for  these  tests,  since  we  can  compute  the  vector 
Hy-s  by  solving  the  equation  Gp=  y-Gs  for  p. 
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y C'otislraiiu'tl  DirL'clioiis  o[,  S>.'ar<-'li 

Many  linearly  CDnslraiiicO  variable  melric  algorithms  reijiiire  mininii/alion  on  the  current 
set  ol  constraints  belore  tiny  ol  these  constraints  can  be  droppcil,  (see  for  example  Ciill  and 
Murray  | IS)  anil  Cioldlarb  1 17|  ) In  addition  these  algorithms  often  drop  only  one  constraint  at 
a lime.  Algorithms  which  allow  dropping  several  constraints  simultaneously,  generally  solve 
full  dimensional  ipiadratic  programs  ;it  each  iteration  to  determine  which  constraints  to  drop, 
(see  for  example  I'letcher  |7|  and  Mangasan;in  |(»))  l-or  additional  comments  see  Ciill  and 
Murray  |5|. 

I he  scheme  proposed  below  is  eipiisaleni  to  solving  a dual  to  the  original  problem.  I his 
dual  is  the  same  si/e  as  the  number  ol  active  constraints.  If  at  a given  iteration  there  are 
active  constraints,  we  consider  the  following  c|uadr;itic  programming  problem;  minimize, 

Ax'CiAx 

t|(Ax)=  + i>'Ax  (14) 

subject  to  n\Ax<o.  In  (14)  G is  the  current  Hessian  tipproximation,  f:,^  is  the  nxm  matrix 
whose  columns  are  the  inner  normals  to  the  current  set  of  active  constraints.  Thus  q is  an 
approximation  to  the  change  in  f at  x for  any  admissible  Ax.  Nominally,  this  is  a full  dimen- 
sional problem  However,  Lemma  I slates  that  we  can  replace  (14)  by  a problem  whose  size 
equals  the  number  of  active  constraints.  I'he  quadratic  programming  algorithm  in  Canon, 
Ciillum  and  Polak  |l‘)|  is  used  to  solve  this  equivalent  problem,  see  (1,‘i).  If  this  algorithm 
finds  a solution  to  (15)  such  that  this  Ax  is  a direction  of  ilescent  for  f,  Ax*Ci  Ax  > o,  and  the 
diagonal  elements  of  the  projection  of  (i  onto  the  free  constraints  ;ire  all  positive,  then  we 
use  this  as  our  direction  of  search  Otherwise  we  revert  to  solving  the  fiill-sizeil  quadratic 
problem  in  (14),  imposing  upper  and  lower  bounds  anil  using  l lelcher  and  Jackson  12()|.  Ihe 
algorithm  in  |2()|  works  on  any  (JP,  if  G is  indefinite,  then  the  direction  genetaied  need  not 
be  a quasi-Newton  direction  on  some  subset  of  the  constraints  If  the  direction  generated 
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vanishes,  we  ehoose  a coordinale  line  lo  search.  I his  is  equivalent  to  checkinjt  the  multipliers 
and  releasing  a constraint  whose  multiplier  has  the  wrong  sign. 


L.emma  ^ I he  OP  in  (14)  is  equivalent  to  the  following  OP  of  size  m,  the  number  of 
constraints  active  at  x:  minimize 


h (V)  = 


vf^fE,  ' HE.,)v 
1 


- v'E^'llg 


(15) 


subject  to  v>o.  Equivalence  means  equivalence  of  the  associated  necessary  conditions  of 
optim.iliiy. 

Proof  ^ L.emma  J_.  The  necessary  conditions  of  optimality  for  the  problem  in  (14)  are 


GAx  + g - l-',,v  = o with  v>o,  v*(E^’Ax)  = o,  and  I!,_‘Ax>o.  (16) 

Since  Ci  is  invertible,  we  have 


Ax  + Hg  - HE^v=o.  (17) 

This  equation  projected  onto  the  active  constraints  becomes 

Ej ' Ax  + E, ' Hg  - Ej'HE^v  = 0,  v'  (L£^ ' Ax)  = o,  v > o,  E.^ ' Ax  > o.  (18) 

But  these  are  the  necessary  conditions  of  optimality  for  the  mxm  QP,  in  (15)  where  the 
multipliers  for  this  problem  are  just  the  vector  EJAx.  Thus  any  solution  of  (16)  yields  a 
solution  to  (18)  and  conversely,  given  a solution  to  (18)  we  can  generate  a .solution  to  (16) 
using  the  equations  in  (17)  to  solve  for  the  unconstrained  components.  (J.F.D. 

If  G>(),  then  any  solution  of  (16)  yields  a minimizing  point  of  q,  and  Ax  is  the  optimal 


direction  of  movement  for  q 
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We  note  that  if  only  C5  is  availahic  in  the  program,  we  can  generate  the  necessary  vectors, 
for  this  ct)mputation  hy  solving  the  equations  = for  the  matrix  1’  We  also  note 
that  (15)  is  the  iliial  of  (14).  Murray  |211  mentions  using  the  dual  problem,  but  no  details  are 
given. 


For  this  approach  to  be  useful,  we  need  a good  quadratic  programming  algorithm. 
Obviously,  if  fi.,Mli;,^  is  indefinite,  (15)  has  no  solution.  If  we  put  an  upper  bound  on  Ax, 
then  a solution  could  exist,  but  the  reduction  in  dimensionality  would  not  occur.  The  quadratic 
programming  algorithm  described  in  Canon,  Ctillum  tmd  Polak  |1‘)|  has  the  following  proper- 
ties. Consider  the  OF.  minimi/e 


: ' A/. 


-I-  d ' / 


subject  to  Fx>o.  Then  (a)  if  A>o  and  a finite  solution  exists,  this  algorithm  will  find  that 
solution,  (b)  if  A>o  and  there  is  no  finite  solution,  the  procedtire  will  generate  an  infinite 
ray,  and  (c)  for  general  A,  in  a finite  number  of  steps,  this  proceilure  either  generates  ;i  /. 
that  satisfies  the  stalionarity  conditions  or  indicates  that  no  solution  exists.  For  our  prttblem 
(15),  (b)  cannot  occur. 


f.emma  2.  If  G>o,  then  f:^’HF,.>o  and  the  QP  in  (15)  always  has  a finite  solution. 


Proof  of  Lemma  2^  Since  E,^'Hfi^  is  the  projection  of  II  >o  onto  the  space  spanned  by  li,_,  it 
must  be  positive  semi-definite  on  this  subspace.  No  infinite  solution  can  exist  since  the  null 
space  of  E.,'llF',  and  of  the  constraint  v>o  is  trivial. 


(^  .F,  n 
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The  use  of  (15)  allows  us  to  work  with  QP’s  of  the  same  size  as  the  eurreiit  number  of 
active  eontraints.  Both  QP’s  allow  us  to  drop  none,  one,  some  or  all  of  the  active  constraints. 
The  QP  gives  us  only  the  direction  of  search  and  not  the  step  size  which  is  obtained  by  a line 
search. 
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(x  A Scalinu  I Icurislic-. 

In  the  circuit  design  applications,  because  of  wide  variations  in  scale  between  different 
elements  of  the  circuit,  some  advantages  can  be  obtained  by  scaling  the  starting  Hessian  matrix 
Ci|,,  We  want  to  determine  a matrix  I’  such  that  the  function  f-(x')=  f(Px')  is  better  condi- 
tioned than  f(x).  Powell  |22|  h;is  shown  that  by  simply  setting  Ci,|=  P 'P  ' instead  of  I, 
subsequent  steps  in  the  x-space  arc  identical  to  steps  that  would  be  taken  in  the  x'  space  if  we 
explicitly  made  this  substitution  and  worked  directly  with  r'(x'). 

The  following  heuristic  was  useil  to  generate  a P.  We  use  a simple  diagonal  scaling.  It  is 

assumed  that  the  user  has  supplied  realistic  upper  and  krwer  bounds  a^  and  b^,  l<j<n,  and/or 

a reasonable  initial  value  Xj"  for  each  parameter.  Then  p^^  is  computed  as  follows. 

•• 

Pjj  = min  ( I Xj”  I , (bj-ap)  if  | Xj'’  | > I and  I < b-aj  <x 

max  ( I Xj"  I , ( b|-aj))  if  1 X|'’ | < I ;md  b-aj  < 1 

< 

max  ( 1 , I Xj"  I ) if  bj-aj=oc 

I otherwise 

k 

For  example  if  Xj"  - It)  ’,  bj-aj  = lO  "*,  then  p^^  = 11)  ^.  If  = I0-,  b|-;i|  = in'*,  then 
Pit  = '‘''■ 

Whenever  the  algorithm  is  restarteil  the  (i„  matrix  is  rescaleil  using  the  current  iterate. 

We  note  that  setting  equal  to  a diagonal  matri.<  may  me;in  that  in  some 

problems.  In  particular  this  may  happen  if  the  variables  are  separable  Hut.  ol  eourse,  we  do 
not  need  we  only  need  Ollg  to  approximate  g.  We  note  also  that  as  indicated  in  Hard 

1 21 1 scaling  or  hick  of  it  in  any  of  the  variable  metric  schemes  can  cause  serious  distortions  in 
the  approximating  matrices  anil  leail  to  near  singularities.  In  the  minimi/ation  procedure  the 
user  should  incluile  all  the  information  about  the  given  problem  ih.ii  is  available,  anil  should 
anticipate  difficulties.  When  difficulties  oecur.  one  obvious  thing  lo  try  is  resetiling  the  (i 
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matrix,  perhaps  using  the  relative  sizes  of  the  current  gradient  components  as  scalings  on  each 
component. 

The  scale  p^j  are  also  used  in  the  program  with  the  user-set  tolerance  S for  (a)  testing 
when  a parameter  is  on  a boundary  and  (b)  generating  minimum  steps  that  are  used  in  the  line 
searches.  These  tolerances  can  be  set  to  yield  a crude  approximation  to  the  optimum  which 
can  then  be  refined  by  reducing  the  tolerances. 
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7^  Line  Search  Procedures, 

For  eonipleleness  the  line  search  is  outlined,  but  no  claims  for  the  superiority  of  this 
method  are  made.  The  new  elements  are  the  minimal  step  requirement  which  is  put  in  to  deal 
with  errors,  and  the  use  of  bendinp.  Except  for  the  gradient  at  x,  the  line  search  uses  only 
function  values  and  thus  can  be  used  even  if  the  gradient  is  being  approximated. 

The  initial  step  along  the  line  of  search  is  obtained  by  using  the  current  quadratic 
approximation  G.  This  first  step  is  either  ||  d ||  | g'd  | ' | d'Gd  | or  the  distance  to  the  closest 
boundary,  which  ever  is  smaller.  We  note  that  for  directions  of  negative  curvature  this  is  not 
an  optimal  choice.  If  a constraint  is  encountered  before  the  minimum  is  bracketed,  we  project 
d onto  the  new  constraint  and  continue  to  move  along  the  bent  line,  until  a bracket  on  the 
minimum  of  f is  achieved.  Once  the  minimum  is  bracketed  a quadratic  is  fit  to  the  3 function  j 

values  that  form  the  bracket.  Steps  out  are  obtained  using  quadratic  extrapolation.  If  a | 

concave  fit  is  made  on  some  step,  (indicating  negative  curvature),  a larger  extrapolation  j 

1 

outward  is  made.  Observe  that  if  we  fit  a quadratic  on  a bent  portion  of  the  line  of  search, 
see  Figure  2,  then  we  can  expect  only  a crude  fit,  since  the  one-ilimensional  quadratic 
approximation  to  f would  be  different  on  each  arm  of  the  bend. 

DIRECTION  OF 


I'igure  2 
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One  could  avoid  this  problem  by  computing  gradients  at  the  3 points  in  the  bracket  and  then 
taking  the  proper  combination  of  gradient  and  function  values  at  2 of  the  points  to  fit  a 
quadratic  on  the  associated  subinterval.  We,  however,  did  not  do  this.  In  our  implementation, 
the  bending  is  done  in  the  function  evaluation  subroutine  and  not  in  the  line  search  subroutine. 
The  steps  in  the  line  search  are  taken  along  the  original  line,  but  the  function  is  evaluated  at 
the  projections  of  these  points  onto  the  constraints,  see  Figure  .3. 


F-igure  3 


The  quadratic  fit  is  made  along  d using  the  values  X|,  Xt  ' , x^',  however,  in  each  case,  with 
the  function  evaluated  at  the  projected  points  X|,  Xi,  Xj.  This  simplifies  the  search  routine. 
As  shown  by  McCormick  |8)  bending  prevents  jamming  or  zig-zagging,  the  phenomenon  where 
the  minimization  procedure  is  completely  controlled  by  the  constraints  (/outeiulijk  |24|). 
Note  that  when  bending  occurs,  the  change  in  x,  s = x*-x,  is  not  in  the  initial  direction  tl. 
hence  is  not  a quasi-Newton  direction. 

Even  though  we  are  using  the  SRI  update  we  require  line  searches  because  they  are 
beneficial  in  controlling  the  condition  of  the  matrices  generated.  If  f is  a quadratic  function. 


r 
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then  each  successful  update  increases  the  dimension  of  the  space  S,  where  srS  implies  that 
Cis=Os.  Obviously,  if  the  directions  s are  arbitrary,  then  the  overall  problem,  find  O from 
QS  = Z = GS  can  be  arbitrarily  poorly  conditioned.  However,  with  line  searches,  the  successive 
directions  are  more  likely  to  be  nearly  conjugate  and  thus  sufficiently  independent. 

Vlinimi/.ation  also  helps  maintain  positive  definiteness.  If  G>o  and  s'(y-Gs)>o,  then 
G*>o,  although  this  is  not  necessary  in  order  that  G*>o.  If  we  minimize,  then  s'yss  -g's>o 
which  is  certainly  necessary  for  s'y>s’Gs>o.  If  we  do  not  minimize,  then  (g*)'s  would  be 
positive  if  we  overshoot  and  negative  if  we  undershoot,  and  there  is  no  guarantee  that  s’y>o. 

In  the  framework  of  function  and  gradient  evaluations  with  error,  it  is  important  not  to 
evaluate  at  points  too  close  to  each  other,  otherwise  the  error  may  dominate  the  minimization. 
Prior  to  the  beginning  of  the  optimization  procedure,  the  user  has  to  specify  a tolerance  S, 
which  is  the  minimal  step  allowed  in  each  parameter,  if  each  parameter  is  scaled  to  I . This 
tolerance  is  scaled  for  each  parameter  Xj  by  the  scaling  factors  p-  computed  in  the  H“ 
computation  previously  described  in  Section  6.  At  the  beginning  of  each  line  search  a minimal 
step 


j 

i 

I 


Min  6pii 

1 <j<n  dj 


(20) 


is  computed,  where  d is  the  direction  of  search.  I'or  example,  if  10 ‘*<X|  < 10'^,  10<X2<I00, 
and  x‘’=  (I0‘^,  20),  then  Pn=  10  ' and  Ps2=  20.  I'hen  if  the  function  had  been  evaluated  at 
X no  further  function  evaluation  is  allowed  in  the  set  |x|  |X|-X||  < lO  'i)',  | Xi-Xt  ( <20(5j. 

Thus,  it  is  possible,  even  though  g'd  < o,  for  the  search  to  indicate  that  no  descent  was 
achieved,  if  a step  larger  than  the  minimal  step  could  not  be  taken.  In  this  case  an  alternative 
line  is  searched  (as  implemented  a coordinate  line).  Probably  a better  choice  would  be  an 
eigenvector  of  G.  Ideally,  the  tolerance  is  related  to  the  accuracy  achievable  in  f and  g. 
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Two  line  searches  were  considered,  one  attempting  to  bracket  the  minimum  on  a line,  the 
other  using  a ratio  of  the  reduction  in  f last  achieved  on  that  line  to  the  estimated  next 
reduction  achievable.  Numerical  results  for  both  are  given  in  Section  9 as  search  1 and  search 
2,  respectively.  Not  requiring  bracketing  typically  reduces  the  number  of  function  evaluations 
required,  but  increases  the  number  of  iterations.  Thus,  one  would  expect  search  1 to  he  best 
for  problems  where  gradient  evaluations  are  significantly  more  expensive  than  function 
evaluations,  and  search  2 to  be  best  for  problems  where  these  costs  are  approximately  equal. 

In  the  next  section  we  describe  the  convergence  test  used  and  motivate  the  use  of  other 
directions  of  movement  (e.g.  coordinate  lines  or  eigenvectors)  in  addition  to  the  quasi-Newton 
directions. 

It  is  desirable  for  all  variable  metric  algorithms  to  have  a restart  mechanism.  In  our 
algorithm  the  following  heuristic  is  used  for  restarting.  It  is  tied  to  the  line  search.  If  (a)  n+  ! 
updates  have  been  achieved  since  the  most  recent  restart,  and  (b)  on  each  of  3 successive 
iterations,  5 or  more  function  evaluations  are  required  in  the  line  search,  then  the  procedure  is 
restarted.  is  rescaled  on  any  restart.  3 and  5 were  chosen  heuristically.  We  probably 
should  have  excluded  directions  of  negative  curvature  from  this  test;  however,  this  was  not 
done.  In  the  tests  run,  restarts  occurred  only  infrequently.  The  idea  of  throwing  away  all  the 
information  accumulated  to  date  and  starting  fresh  does  not  appeal  to  us. 
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^ Alternative  Directions  of  Search  and  the  Conver>!ences  Tests. 

In  addition  to  the  non-ijuasi-Newton  directions  that  can  be  generated  by  the  full  OP,  two 
other  families  of  vectors  can  be  used  when  the  quasi-Newton  directions  on  subsets  of  the 
constraints  are  not  acceptable.  Whenever  lly  = s.  Theorem  4 suggests  that  we  use  an  eigenvec- 
tor W|,  of  such  that  s=<»F.W|^,  lly?ts  and  (g*)'s#o.  Theorem  5 says  that  if  Hy/s,  but 

y*(My-s)=o,  then  we  can  find  a suitable  coordinate  vector  Cj  which  will  allow  an  update.  In 
the  numerical  results  described  the  eigenvectors  of  K’GH  were  not  computed,  coordinate  lines 
were  used  in  both  situations.  Initially  the  coordinate  lines  were  ordered  according  to  the  size 
of  gj,  scaled  using  the  p^^  generated  in  finding  IP’.  The  coordinate  lines  were  considered 
cyclically  within  this  ordering.  Coordinate  lines  were  also  used  whenever  the  inner  product  of 
the  projected  gradient  with  the  computed  direction  of  movement  vanished.  Since  we  do  not 
require  II  and  G to  be  positive  definite,  w'llw  can  vanish  when  Hw/o. 

If  a successive  check  of  all  the  coordinate  lines  at  a given  iterate  yields  no  descent,  then 
the  procedure  terminates.  When  we  stop,  because  of  the  minimum  step  requirement,  we  are 
only  near  a Kuhn-Tucker  (hopefully  minimum)  point.  The  distance  from  the  ’minimum’ 
depends  upon  the  distortion  of  the  local  contours  of  the  function  being  minimized.  The 
following  Theorem  gives  an  estimate  of  this  error  in  the  unconstrained  case  for  f a quadratic 
function. 

Iheorem  ^ Let  f be  a quadratic  function,  f(x)=  x'Ox/2  + d'x  where  0>o.  l-or  any 
(S>o  and  scales  p^^,  I <j<n,  if  ;it  some  point  x*,  flx’  + ^p^jCj)  > f(x*l  for  all  1 <j<n. 

Then 


1 1^,1 
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and 

(Vp^^2g^^2):/2 

where  A (O)  is  the  minimum  eigenvalue  of  0- 

Proof,  Since  f is  a convex  quadratic  function,  for  each  j and  for  all  | n | > Pj|i5, 
i/)j(a)=  f(x*+nCj)  - f(x*)  = agj*+  a-Qjj/2  >o. 

The  minimum  of  <>j(a)  occurs  at  aj=  -gj*/Qjj  tind  </>j(aj)  <o.  Furthermore,  (pj(o)=o=<p(2aj). 
Therefore,  <f>j(a)  <o  in  the  interval  |0,2aj|  and  therefore,  Pjj<5>  2uj.  Substituting  for  and 
rearranging  we  get 

I gj*  I <Pjj<50jj/2. 

(21) 

Combining  (21)  and  the  fact  that  lgopt||=o,  we  get  for  each  j 
1 10(x*-xopt)lt  1 < pjj5Qjj/2 

Therefore, 

A-  I I x*-xopt  I I -<  II  Q(x*-xopt)  I I - < — Sipjj-Ojj- 

4 

or 

I I X*-XOpt  I I < (i:p  :2)>/2. 

2A(0)  ” " 

where  ^(Q)  is  the  smallest  eigenvalue  of  Q. 

O F n 

Theorem  T If  we  used  the  eigenvectors  of  O.  V|^,  I <k<n,  in  Theorem  6,  then  we  obtain 

I Vjig*  I < |min  I I I 


(22) 
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ami 


x*-xopi  I I < — ( illmin  1 p^^l^/vJk  | 1")'^- 


(23) 


In  (22)  and  (23)  the  minimum  is  taken  over  those  k with 


Proof.  For  the  eigenvectors  v^,l  <j<n,  the  minimal  allowed  step  in  the  line  search  along 
Vj  corresponding  to  the  scales  Pi^^.l<k<n,  is  6 niin|^(Pl^l^/Vj'‘).  The  revelant  expansion  is 
</)j(a)=  f(x*  + aVj)-  f(x*)=  ag*t'j  + t«'\j/2  where  is  the  eigenvalue  of  Q corresponding  to  v^. 
But  the  minimum  of  <)>j  occurs  at  Oj=  -g*Vj/\j,  <>j(o)  = o=<)>j(2aj).  Therefore, 


S min  (Pll/,, 
k > 


k)  > 2«j  . 


Rearranging  we  obtain  (22). 

.Since  f is  quadratic,  O(x*-xopt)  = g*.  Let  V be  the  matrix  whose  columns  are  the  \j, 
I <j<n  Then  introducing  V as  a basis,  we  obtain 

(V'QV)  V'(x*-xopt)  = V'V(V'g*) 
or  .\V’(x*-xopt)  = V'g* 

where  .\  is  the  diagonal  matrix  w'ith  entries  Aj,  l<j<n.  Therefore,  from  (22) 

5 

I Vj  ‘ (x*-xopt)  I <“l 
Summing  over  components  gives  us  (23). 

OT-.D 

The  absence  of  \(0)  in  the  estimate  (23)  tells  us  (as  expected)  that  the  directions  v^  are  a 
better  choice  than  the  Cj.  Thus,  if  G is  a good  approximation  to  O,  we  would  probably  do 
better  to  use  the  eigenvectors  of  G rather  than  the  coordinate  lines. 
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In  the 
constraints. 


next  section  we  present  some  numerical  tests, 
and  with  and  without  noise  added  to  the  function 


problems  with  and  without  box 
and  gradient  evaluations. 
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^ Numcricul  Results 

A computer  program  was  developed  as  the  opiimizaiion  part  of  a jireuit  design  package 
and  has  been  used  on  circuit  design  problems.  However,  it  does  not  seem  practical  to  use  such 
functions  as  test  problems  since  other  researchers  would  not  have  access  to  the  circuit 
generating  package  used  to  evaluate  these  functions  and  gradients.  I herefore.  so  that  future 
comparisons  can  be  made,  we  present  some  results  for  a few  other  functions;  most  are 
standard  test  functions  found  in  the  opiinu/ation  literature.  I'he  test  functions  used  are  listed 
below  with  their  starting  values,  and  bounds 

Constrained  Problems 

1)  FL’NCTION  F~H()I./  (Himmelblau  |25|) 

P‘)  .-(u.-x.)''  . 

f(X|.x-,.x,)  = (el  ) -.Oli)- 

i=  I ' X I / 


= 25  + (-50  In(.OIi)) 


I 

1.5 


Bounds:  .1  < X|  < 100 

0 < x->  < 25.6 
0 < X 5 < 5 

x‘>  = (10,  1.25,  .3) 

xopt  = (50,25.0,1.5) 

f(xopt)  = 0. 

2 ) rUNCTION  f RFCP  (Dixon  |26|) 

f(X|.XT,x,)  = (X2-5)‘  + (Xi  + x,’)’  + x,-'X| 
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Hounds:  10'^  < x, 

x"  = (1,2,1) 

xopt  = ( 1()-',  1 234,  0) 

f(xopt)  = l()5()45S55 

3)  I UNC  I ION  N(3 

IH3(/|, /,,/,)  = f(  l()()/|,  .01  z,.  /.,) 

where  f(x,y  ,/)  = «/-(  1 -/i((  x + y-3)'*  + (y-x-.H)^)) 

+ y((x  + y-3)-  + (y-x-.S)') 

+ 2(  l-/)‘'(-0.‘)5x-  + l.25xy+  2.4Sy-) 

,.  = 001  , H = 2500  . Y = 5 

Hounds:  (H  < Z|  < .02 

I00<  /,  < 200 

0 < z,  < 1 

z“  = (.010,  I 10,  .0) 
zopl  = (.02,  1 10.2')  , 0) 
fH3(/opi)  = -53.5')X52‘>4 

4)  ( UNCn  iON  l IUi 

fHC)  (Z|.Z2.Z3,Z4.Z^,Z,,,)  = IH3(z,,/2./U  rH3(/4./^  .z,,) 
Hounds:  .01  < Z|,z,  < 02 


I00<  z,,z,  < 200 
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<*  < -'v-'f,  < I 

/•’  = (.0151,151,  021,  .015,  150,  .02) 

/opl  ; ilicrc  arc  several  minima, 

K/opt)  = -2''5. 40044, -.340, 001, -402. .3  1 1.332 

LJiiconslraineO  Prohlems 

( 1 ) igiNc  i ION  I Kosr 

Kxi.Xt)  = 100  (X|-  - X-.)-  (1-X|)- 

x"  = (-1.2,  1.0) 

xopi  -■  (1,1) 

f(xopt)  = 0. 

( 2 ) l-UNC  TION  1 WOOD 

f(X|,X-,,.\;,X4)  = 100(X-.-X|-)-  + (1-X|)-  + 0()(Xj-X-,’)- 

+ (1-x,)-  + lO.lKx,-!)'  4-  (X4-I)-)  + IO.S(Xj-l  Hxj-l ) 
x'>  = (-3,-1, -3,-1) 

xopt  = ( 1 . 1 , 1 , 1 ) 

Kxopl)  = 0, 

( 3 ) KINC  I ION  I I ASY 

f'x)  = x'  (,)x  - x'  g 
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where  Q = / 33  23  7S.‘;S  (i  1 .33 

53.23  45.93  h2. 14  45.11 

V 78.98  62.14  89.14  62.45 
61.33  45.11  62.45  47.05 

anil  g = (1.  4.  2,  3) 

x<>  = (0,  0,  0,  0) 

xopt  = (.1304  , .8245  . '.  4068  , ‘ .3886) 

f(xopi)  = -11.7182934 

(4)  FUNCTION  FPOWL 

f(X|,Xi,Xi.X4)  = (X|  + 10  x^)'  + 5 (Xj-x^)-  + (X2-2x;()‘*  + lOCXj-x^)'* 

X"  = (3, -1,0,1) 

xopt  = (0.0, 0,0) 

f(xopt)  = 0 

Of  the  constrained  problems,  F-HOLZ  (Minimelblau  |25|)  and  FRFCI’  (Dixon  (26|)  have 
appeared  in  the  literature.  F-B3  and  F-H6  were  constructed  specifically  for  these  tests  as 
problems  that  would  involve  much  hitting  and  releasing  of  constraints  before  a local  minimum 
could  be  achieved;  and  moreover,  so  that  their  Hessians  are  indefinite  at  each  ol  the  minima 

We  note  that  the  initial  point  used  in  the  FllOl.Z  tests  was  not  the  same  as  that  used  in 
Mimmelblau  |25|.  At  the  starting  point  in  |25|  FMIOI.Z  is  very  flat  and  the  2 Harwell  methods 
used  would  not  move  from  that  point.  To  obtain  comparisons,  we  thcrclore,  inovcil  to  a mote 


reasonable  startinu  point  The  starting  points  lor  I H3  ami  MU)  were  chosen  such  that  the 
procedure  must  circumvent  a saddle  point  to  get  to  a minima. 

I'Ur.CP  as  siatcil  in  Dixon  |2f)|  involves  a constraint  y'>>y|".  We  have  Iranslormed  the 
variables  and  mapped  this  constraint  into  X|>()  Of  course  ;it  X|=0  , I RDCI’  is  not  differenlia- 
ble;  near  X|=()  it  has  steep  derivatives  liowevcr,  the  constraint  x,  > 10'^  , is  not  too  severe,  and 
all  the  algorithms  behtive  well. 

For  the  unconstrained  prt)blems,  I'KOSfi.l 'WOOD,  and  l-FOWI  are  standard  test 
functions.  ['ROSIi  and  I’WOOD  each  have  a banami-shapcd  valley,  l-WOOD  also  has  regions 
of  negative  curvtiture  and  saddle  points.  F'i’OW'l  has  a singular  Hessian  at  its  optimum. 
f'EiASY  is  a simple  positive  definite  quailratic  function-,  everything  should  work  well  on  it. 

I'or  the  purposes  of  comparsion  we  compared  the  new  algorithm(s)  with  the  correspond- 
ing algorithms  in  the  Ihirwell  .Subroutine  Library  |27|.  Those  used  were 
lltirwell  na me  Referred  ^ ^ 

CFletcher 

VFiO'iAD  Miickicy 

VAOdAD  L'letcher 

VA04AD  Pow-no-drv 

I'he  first  two,  Vli().4A  and  VF.OSAD,  are  variable  metric  algorithms  for  linearly  constrained 
problems  with  box  constraints.  VI:I).4A  is  due  to  l letcher  |2S|  and  solves  a full  quadratic 
programming  problem  in  a variable  si/e  box  ;it  etich  ilertilion  VI  (I.SAD  is  tiue  to  Muckley  |2d| 
and  is  a modification  of  Cioklfarb's  linctirly  constraineil  iilgorithm  |I7|.  Ilie  last  two. 
VAOhAD  and  VA04AI'),  are  for  unconstrained  problems  N'.AOd.AD  is  ;i  variable  metric 
algorithm  due  to  Fletcher  |.40).  V.M)4AD  is  Powell's  no-derivative  procedure  1 .7  I | VIO.LN 
was  only  avtiilable  in  the  single  precision  version;  rouiuloll  in  the  470-l()S  could  be  lO*'.  so 
the  results  for  ('Fletcher  should  be  viewed  in  this  lighi 
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Tests  were  run  on  the  eijtht  test  runctiuns  with  and  without  noise.  I'or  each  function, 
noise,  e,  in  the  function  values  was  simulated  by  specifying  a relative  error,  r and  an  absolute 
error  a,  and  then  setting  the  error, 

e = y ( r I f I + a ) 

where  y is  a uniformly  distributed  random  variable  in  the  interval  (-1,1  1.  Noise  in  the  gradient 
was  generateil  similarly.  We  realize  that  this  imperfectly  simulates  the  effects  of  truncation 
error  for  functions  that  depend  on  the  solutions  of  differential  equaiions.  But,  we  do  not  have 
a good  model  for  the  truncation  error,  and  we  expect  that  the  simulateil  effect  is  worse  than 
the  truncation  effect.  Thus,  if  a procedure  works  reliably  in  the  presence  of  our  simulated 
error  we  expect  it  would  work  well  in  the  presence  of  truncation  error. 

f he  results  of  all  tests  are  in  Table  1 They  are  scparateil  by  function,  each  has  a run 
with  no  noise,  with  single  precision  noise,  and  with  a lot  ol  noise.  I he  number  ol  functions 
and  gradients  required  for  each  procedure  is  given,  l or  each  ol  the  Harwell  routines,  the 
convergence  criterion  was  | Ax,  | < It)  '*,  l<i<n.  I'or  the  algorithm  ilescribed  in  this  p,i|)cr, 
the  tolerance  <5=  10  "*,  and  the  minimum  Ax  was  obtained  by  sc.iling  as  ilescribed  in  Section  7 
Convergence  occurred  when  a successive  search  at  minimum  Ax  ol  the  coordinate  lines  through 
the  current  iterate  yielded  no  descent;  or  a consistency  check,  designed  to  spot  when  the  errors 
in  the  gradient  or  function  evaluations  are  dominating  the  minimi/.ition  had  been  violated  a 
total  of  .■?  times. 

In  the  test  results  in  Table  I,  the  algorithm  ilescribed  in  this  paper  is  called  lU'KI.  lot 
box  constrained  rank  J^.  Four  versions  of  BCRI  were  tested  BCKKI.I)  and  B('RI(2,I)  are 
direct  implementations  of  the  procedures  ilescribed.  differing  only  in  the  line  search  used 
Fine  search  I requires  more  function  evaluations  and  attemiits  to  bracket  the  minimum  bclorc 
leaving  the  search.  I ine  search  2 leaves  the  search  as  soon  as  it  has  achieved  a reasomiblc 
decrease  in  the  function  Both  searches  use  only  function  v.iliics  with  the  exception  ol  tlu 
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gratlicni  at  the  slartinf;  point  of  the  search  Use  of  line  search  2 often  results  in  an  increase  in 
the  number  of  iterations  over  line  search  I and  thus  an  increase  in  the  number  of  gradient 
evaluations.  In  the  circuit  design  aitplications,  onee  a function  evaluation  is  niaile,  a gradient 
evaluation  costs  the  same  as  another  function  evaluation,  so  search  2 is  reasonable  However, 
in  other  applications  gradient  evaluations  may  cost  some  multiple  of  a function  evaluation. 
For  example,  if  we  are  computing  derivatives  by  tlifferencing,  we  would  want  to  minimize  the 
number  of  iterations  and  hence  line  search  1 would  be  more  appropriate. 

BCR  I (1,1)  and  BCR  1(2.1)  both  solve  a c|uadratic  subprogram  only  on  those  iterations 
where  at  least  one  constraint  is  active.  1 ach  first  attempts  to  solve  a small  Ul'  using  the  QP 
algorithm  in  Canon,  Cullum  and  Polak  |l‘)|  If  this  fails  (which  can  happen  only  if  G is 
indefinite),  or  if  d'Gd  < 0 , g'd  > 0 or  fi'Cif-  has  a negative  diagonal  element  where  1:  is  that 
part  of  the  iilentity  corresponding  to  the  variables  designaterl  by  the  small  ()P  as  free,  then  a 
full-sized  QP  (using  the  hounds  min  ( | x,-t,  | , lOOp;,)  , 1 < i < n.  where  t,  is  the  appropriate- 
upper  or  lower  bound)  is  solved  using  Fletehcr  and  Jackson  |2()|. 

A modified  version  of  BCR  I was  also  tested,  and  this  is  labelled  BCR  1(1,4)  and 
BCR1(2,4)  in  the  table  where  1 and  2 again  correspond  to  the  2 searches  used.  BCR1(»,4) 
solves  the  full-sized  (,)P  at  each  iteration  to  determine  the  direction  of  search.  One  might 
expect,  since  this  approach  is  global,  that  convergence  should  improve  over  BCRI(l.l)  and 
BCR  1(2.1).  However,  the  numerical  results  indicate  that  this  is  not  true. 

rite  total  work  rei|uired  is  given  as  the  sum  of  ICNS  plus  (iRI)S.  I his  only  makes  sense 
if  the  function  and  gradient  evaluations  have  ci|ual  cost.  Otherwise,  the  2 terms  must  be 
weighted  by  their  relative  costs.  Moreover,  we  are  concerned  here  with  problems  where  the 
cost  of  a function  or  gratlient  evaluation  dominates  the  cost  of  the  compulations  done  in  ihe 


optimization  algorithm. 


I*at;c  4 I 


On  Ihis  basis  of  work  ilonc,  wc  would  judjic  l$C'RI(2,l)  lo  he  ihe  best  of  ihc  lour  options 
for  BCR  I.  This  method  proved  to  be  reliable  and  efficient  in  the  presence  of  box  constraints 
and  noise.  Furthermore,  the  results  indicate  that  it  compares  favorably  with  VAO'MD,  an 
excellent  unconstrained  method  (judged  by  Himmelblau  to  be  the  best  of  those  tested  by  him) 
Performance  did  not  degrade  in  the  presence  of  noise.  In  all  cases  BCRK2.I)  terminated  at 
a reasonable  point,  either  at  or  near  a local  minimum,  or  at  a point  where  it  could  not  be 
expected  to  proceed  because  the  noise  was  of  the  same  magnitude  as  the  function. 

BCRl(l,l)  was  as  reliable  as  BCR1(2,1).  In  most  cases,  it  used  more  function  values  but 
fewer  gradients  then  BCR  I (2,1).  Thus,  one  might  expect  it  to  be  more  efiicient  on  problems 
where  the  cost  of  computing  a gradient  is  a multiple  of  the  cost  of  a function  evaluation. 

We  had  expected  that  BCR  I (1,4)  and  BCR  I (2,4)  would  perform  better  than  BCR  I (1.1) 
and  BCRK2.1)  since  they  use  global  information  on  each  iteration.  However,  they  rliil  not 
Perhaps  the  size  of  the  box  used  with  the  algorithm  in  12()|  was  too  large  or  should  have  been 
allowed  to  vary  as  is  done  in  VE03A  (CFIetcher).  However,  the  results  in  Table  1 lot 
CF'letcher  do  not  indicate  that  this  woulil  be  a profitable  direction  to  pursue;  since  CFIetcher 
did  not  perform  well  on  any  problem  except  f'RFCP.  This  comment  should  be  tempered  with 
the  comment  that  CFIetcher  is  a single  precision  routine.  As  the  table  indicates  for  FB.3,  using 
the  scaling  suggested  by  Fletcher,  termination  at  the  saddle  point  occurred.  However,  using  no 
scaling,  the  procedure  went  to  the  minimum  of  F'B3.  For  FBb  it  went  to  the  stidvlle  point  on 
all  scalings  tried.  For  FHOI  /.  terminati>  n occurreil  at  a noncntieal  point. 

VFO.SAT)  (Buckley)  did  not  perlorm  reliably  on  any  function  except  FRl  ('P  As 
indicated  for  FB3  and  FBb,  when  the  initial  point  was  shifteil  closer  to  a local  minimum, 
VFiOS.AT)  produced  that  local  mininuim.  This  leails  us  to  believe  that  we  had  the  procedure  set 
up  properly  The  output  of  VFO.SAI)  obtainevi  iiulieates  that  not  enough  safeguards  are  htiilt 
into  the  line  searches.  In  parlictilar.  on  some  runs  bouiuls  weic  viol.itcil,  and  when  noise  was 
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present  the  proeeclure  did  not  recognize  it  and  terminate  the  iterations  when  the  noise 
dominated  the  minimization.  On  FB3  and  FB6,  it  terminated  normally,  but  at  points  that  are 
not  local  minima  or  saddle  points.  In  each  of  these  2 cases  it  failed  to  allow  movement  along  a 
good  coordinate  direction.  As  stated  earlier  FB3  and  FB6  were  designed  to  be  difficult  to 
minimize  because  they  require  much  hitting  and  releasing  of  constraints.  Moreover,  the 
Hessians  are  indefinite  at  the  minima. 

For  the  unconstrained  algorithms,  VAtldAO  (Fletcher)  exhibited  gooil  behavior  in  the 
presence  of  noise  on  all  functions  except  FWOOD.  We  note  that  with  a smaller  convergence 
criterion  Fletcher  tlid  achieve  the  minima  of  FWOOD  when  there  was  no  noise.  Remarks  in 
the  Harwell  write-up  of  this  program  indicate  that  some  thought  was  given  to  its  performance 
in  the  presence  of  errors  and  certainly  its  performance  in  the  presence  t)f  error  substantiates 
this. 

Tests  on  Powell’s  no  derivative  algorithm  are  included  to  see  if  a procedure  that  uses  only 
iunction  values  but  does  not  use  differencing  to  get  derivatives,  and  that  works  on  the 
principle  of  conjugate  directions,  would  work  well  in  the  presence  of  noise.  Indications  are 
that  it  works  well  with  a moderate  amount  of  noise,  at  least  t>n  the  small  dimensional  problems 
tested.  With  a lot  of  noise,  it  did  not  work  as  well  as  the  variable  metric  algorithms. 

The  work-counts  for  the  Harwell  routines  were  obtained  as  follows.  Both  Cfletcher  and 
fletcher  have  a function,  gradient  subroutine  that  evaluates  both  simultaneously,  so  that  in 
Table  I,  in  all  cases  for  these  two  methods  IT’NS  = GRDS.  From  the  write-ups,  it  was  not 
clear  whether  each  time  the  function  is  called,  the  gradient  information  is  also  used  in  these 
algorithms.  If  this  is  not  the  case,  then  these  algorithms  may  be  more  efficient  than  is 


indicated  by  Table  I.  For  Buckley,  the  function  aiul  gradients  calls  arc  separated. 
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As  is  usually  the  case  in  using  a program  written  by  someone  else,  one  is  never  absolutely 
certain  that  one  has  set  up  the  subroutine  correctly  or  that  a good  or  best  set  of  the  algorithm 
parameters  has  been  fed  into  the  program.  We  tried  to  take  the  best  set  of  parameters.  In 
addition,  when  a program  performed  unreliably,  a different  initial  point  or  scaling  was  used  to 
be  certain  that  with  a given  function  and  algorithm  a minimum  could  be  obtained  under 
different  circumstances. 
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10.  Summary.  The  le.sls  validate  the  theoretical  results  of  this  paper  that  an  algorithm  based 
upon  the  SRI  update  should  perform  reliably  and  efficiently  in  the  presence  of  box  constraints 
and  errors. 

For  the  unconstrained  problems  tested,  with  and  without  errors,  the  small  differences 
between  BCR1(2,1)  and  Fletcher  can  be  understood  by  the  fact  that  IK‘RI(2.I)  may  use  n 
additional  function  and  gradient  evaluations  in  terminating.  The  results  m Fable  1 substantiate 
the  claim  that  BCR  I is  efficient  in  the  presence  of  error.  The  results  preseni-.-d  for  constrained 
problems  do  not  allow  for  comparison  with  an  existing,  efficient,  reliable  algorithm  However. 
BCRl  was  reliable  on  box  constrained  problems  and  we  feel  we  can  imply  from  the  compari- 
sons on  unconstrained  problems  that  it  is  also  efficient. 

Moreover,  the  tests  indicate  that  all  the  points  in  the  motivation  were  well  taken.  In 
particular,  it  doe.s  seem  important  to  use  all  the  information  available  to  tell  when  to  drop  a 
constraint,  and  to  be  able  to  pick  up  the  true  character  of  a function  near  a saddle.  Moreover, 
the  BCRl  procedure  is  simple,  there  is  no  bookkeeping  and  no  matrix  projections  and  no 
explicit  use  of  multipliers  for  the  box  constraints. 

As  designed,  the  procedure  handles  only  box  constraints  directly.  In  the  circuit  design 
work,  more  general  constraints  are  treated  using  penalty  functions.  It  would  seem  that  one 
could  use  a program  such  as  this  as  an  inner  loop  in  a penalty-multiplier  procedure  for  the 
minimization  of  a general  function  subject  to  general  constraints. 

One  final  comment.  Currently,  the  algorithm  identifies  when  noise  is  controlling  the 
procedure  and  terminates  when  it  makes  such  an  identification.  It  does  this  even  though  the 
direction  that  it  has  is  good  so  that  termination  would  not  have  happened  if  a heller  inital  step 
in  the  line  search  had  been  taken  We  are  working  on  this  aspect. 


I'uuc  45 


References 

|l|  Culluni,  Jane  and  Brayton,  R.K.,  (ld76)  Some  Remarks  on  the  Symnielric  Rank  One 
Update,  IBM  Research  Center  Report  6157,  Yorktown  Heights,  New  York. 

|2|  Broyden,  C.G.  (1967)  Quasi-Newton  Methods  and  their  Application  to  |•unetion 
Minimization,  Math.  Comp.  368-3X1. 

|3|  fdetcher,  Roper  (1976)  Methods  for  Solving  Nonlinearly  Constrained  Optimization 
Problems,  University  of  Dundee,  Numerical  Analysis  Report  16 

|4l  Powell,  M.J.D.  (1975)  A View  of  Unconstrained  Optimization,  C.S.S  14,  Af'Rl  , 
Harwell,  Fngland. 

[5|  Gill,  P.E.  and  Murray,  VV  (1974)  Numerical  Methods  for  ('oiistrained  Optimization 
Academic  Press.  London  and  New  Y<'rk. 

|6l  Mangasarian,  0.1..,  (1972)  Dual,  L'easible  Direction  Algorithms.  Lechniques  ol 
0[)timization.  A.\'.  Balakrishnan.  cd  Academic  Press,  New  S'ork,  67-XS. 

|7|  F'letchcr,  R.  (1973)  An  Exact  Penalty  I unction  for  Nonlinear  Programming  with 
Inequalities,  Math  Prog.  5,  129-150 

[8|  McCormick,  Garth  (1969),  Anlizig-z.agging  by  Bending.  Mgnt  Sc  £5  (5),  315-320 

|9|  Hestenes,  Magnus  (1966),  Calculus  of  Variations  anri  Optimal  C ontrol  I'heory.  John 

Wiley  and  Sons,  Inc.,  New  York. 

1 10]  Brayton,  R.K.,  Hachtcl,  G.,  and  Gustavson,  F-.  (1971)  The  Sparse 

Tableau  Approach  to  Network  Analysis  and  Design,  IF  E'E  Transactions  on  Circuit 


T heory,  CT-18  ( I ).  101- I 13. 


Pa>!c  40 


I III  Daviclon,  William  C.  (1475)  Optimally  Concliiioned  Optimi/.alion  Algorithms  Without 
Line  Searches,  Math.  Prog.  4,  1-30. 

[12)  Fiaceo,  A.V.  and  McCormick,  CLP.  (1468)  Nonlinear  Programming.  Sequential 
Unconstrained  Minimi/ation  Techniques.  John  Wiley  and  Sons,  Inc.,  New  York, 
170-172. 

1I3|  Murtagh,  B.A.  and  Sargent,  R.W.M.  (1469)  A Constrained  Minimization  Method  with 
Quadratic  Convergence,  Optimization,  ed.  R.  F'letcher,  215-246. 

1 14)  Murtagh,  B..A.  and  Sargent,  R.W.U.  (1470),  Conipuialiona)  L.xperience  with  Quadrali- 
cally  Convergent  Minimization  Methods,  Computer  Journal,  22,  185-144 

|15|  Powell,  M.J.D.  (1970),  Rank  One  Method  for  Constrained  Optimization,  Nonlinear 
and  Integer  Programming,  ed.  J.  Abadie,  North-Holland,  Amsterdam. 

|16]  Powell,  M.J.D.  (1976),  Quadratic  Termination  Properties  of  Davidon’s  New  Variable 
Metric  Algorithm,  C.S.S.  33,  AF.RE,  Harwell,  England. 

(171  Goldfarb,  D.  (1969),  Extensions  of  Davidon’s  Variable  Metric  Algorithm  to  Maximi- 
zation Under  Linear  Inequality  and  Equality  Constraints,  SIAM  J.  Appl.  Math  17, 
739-764. 

118]  Gill,  P.  E.  and  Murray,  W.  (1974),  Newton-type  Methods  for  Linearly  C’onstrained 
Optimization,  Numerical  Methods  for  Constrained  Optimization,  ed.  P.F-.  Gill  and  W. 
Murray,  Academic  Press,  l.ondon  and  New  York, 

114|  Canon,  M.D.,  Cullum,  C.D.  and  Polak,  F".  (1970),  I heory  of  Optimal  Control  ami 
Mathematical  Programming,  McGraw-Hill,  New  York. 

120|  F'letcher.  R.  and  Jackson,  M.  P.,  (1474)  Minimization  of  a (Juadraiic 


I 


Page  47 

Function  of  Many  Variables  Subject  only  to  Lower  anil  Upper  liouiuis, 

J.  Inst.  Math  Applic.  Jj4,  159-174. 

121 1 Murray,  W.  (1969),  An  Algorithm  for  Constraineil  Minimization,  Optimization,  eil.  R. 
Fletcher,  Academic  Press,  London  and  New  York,  247-25S. 

(22)  Powell,  M.J.D.  (1971)  On  the  Convergence  of  the  Variable  Metric  Algorithm,  J.  Inst. 
Math.  Applies.  7,  21-36. 

|23|  Bard,  Y.  (196S),  On  a Numerical  Instability  of  F)avidon-Type  Methods,  Math.  Comp. 
22,  665-666. 

(24|  Zoutendijk,  G.  (I960),  Methods  of  Feasible  Directions,  Elsevier,  Amsterdam. 

(251  Himmelblau,  D.  M.,  ( 1972)  Applied  Nonlinear  Programming,  McGraw-Hill,  New  York. 

(26|  Dixon,  L.  C.  W.,  (1971)  Variable  Metric  Algorithms;  Nece.ssary  and  Sufficient 

Conditions  for  Identical  Behavior  on  Non-quadratic  Functions,  Numerical  Optimiza- 
tion Centre:  Technical  Report  No.  6. 

127)  Harwell  Subroutine  Library  (1974),  ed.  M.  J.  Hopper.  Theoretical  Physics  Division, 
Atomic  Energy  Research  Establishment,  Harwell,  England. 

(2«|  Fletcher,  R.  (1970),  An  Efficient,  Globally  Convergent  Algorithm  for  Unconstrained 
and  Unearly  Constrained  Optimization  Problems,  A.FLR.E.  Harwell  Report  i.P.  431. 

[29|  Buckley,  A.  (1973)  An  Alternative  Implementation  of  Goldfarb's  Minimization 
Algorithm,  A.Fi.R.E.  Harwell  Report  F.P.  544. 

(3()|  F'letcher,  R.  (1970)  A New  Approach  to  Variable  Metric  Algorithms,  Ihe  Computet 


Journal,  1 3,  3 1 7-322. 


1311  Powell.  M J 


Paj!L-  4S 


n ( l‘>(i4l,  .\n  I llieieni  Methoil  of  I'iniling  the  Minimum  of  a f unclion 


of  Several  Variables  Without  Calculating  Derivatives,  ('omputer  J.  7,  155-162. 


T/-3LI:  1 


■'.9 


Fi  'N . 
tJAMI'. 

FHROR 

( KFl..  , Anr..  ) 

MFTHOD 

FCNS 

GRDS 

FINAL  VALUE 

WORK 

COMMENTS 

Fts3 

( 0,0  ) 

BC-R  1(1,1) 

33 

8 

-53.5985294 

41 

( 1 ) 

1-  HJ 

( 0,0  ) 

B(  -R  1 ( 1,4) 

30 

-53 . 5985294 

39 

( 1 ) 

FB3 

( 0,0  ) 

BCR  1(2,1) 

22 

1 1 

-53.5985294 

33 

( 1 ) 

FB3 

(0,0) 

BCR  1(2,4) 

25 

10 

-53.5985294 

35 

( 1 ) 

FB3 

( 0,0  ) 

BUCKLEY 

1C 

15 

-8.517287 

- 

(5), (8) 

FH3 

10,0) 

CFF.ETCHKR 

92 

92 

L h * *4 

— 

( 3 ) , ( 9 ) 

FB3 

( lF-5,  lE-6  ) 

BCR  1(1,1) 

38 

1 1 

-53.598526 

49 

( 2 ) 

FB3 

( lK-0,  lE-fa  ) 

BCRl ( 1,4) 

4D 

1 2 

-53 . 5985. '..4 

58 

( 2 ) 

FB3 

( lE-5,  lE-G  ) 

F(CR1  ( 2,1) 

33 

15 

-53.5  <85254 

49 

i 2 ) 

FB3 

( IF- 5, lE-6  ) 

BCR  1(2,4) 

27 

1 2 

-5  3.5985.''9  3 

39 

( 2 ) 

FB3 

( IE-5, lE-6  ) 

BUCKLEY 

4,? 

lb 

-8.5173 

” 

( 5 ) 

FB3 

( lE-3, lE-4  ) 

BCRl ( 1,1) 

25 

9 

IK- 3 

34 

( 3 ) 

FB3 

( IE- 3, lE-4  ) 

BCRl ( 1,4) 

2 c 

10 

9. 9F.-4 

36 

( 3 ) 

FB3 

( IE-3, lE-4  ) 

BCR  1(2,1) 

18 

9 

IE  - 3 

27 

( 3 ) 

FB3 

( lE-3, IE-4  ) 

BCR  1(2,4) 

28 

1 1 

9.9K-4 

39 

( 3 ) 

FB3 

( 1E-3, IE-4  ) 

BUCKI.KY 

120 

14 

IE- 3 

1 34 

( 4 ),  ( 3 ) 

FB6 

( 0,0  ) 

BCR  1(1,1) 

121 

30 

”275. 49644 

151 

( 1 ) 

FBb 

( 0,0  ) 

BCR  1 ( 1,4) 

1 18 

26 

-402. 311332 

1 44 

( 1 ) 

FB6 

( 0,0  ) 

BCR  1(2,1  ) 

34 

l4 

-275.49544 

48 

( 1 ) 

FB6 

( 0,0  ) 

BCR  1(2,4) 

46 

24 

-346.091 

70 

( 1 ) 

FB6 

( 0,0  ) 

BUCKI.KY 

•4b 

3 7 

-12. 402465 

- 

( 5 ; , ( 8 ) 

FB6 

( 0,0  ) 

CFLETCHKR 

480 

480 

6.  OK  6 

■ 

( '4  ) , ( 3 ) 

FB6 

( IE-5, lE-6 ) 

BCR  1(1,1) 

42 

12 

5E-6 

- 

( 3 ) 

FB6 

( lE-5, lE-6 ) 

BCR  1 ( 1,4) 

J6 

12 

6. 3E-7 

- 

1 3 ) 

FB6 

1 FE-5, lE-6 ) 

BCR  1(2,1) 

17 

-275.49644 

61 

1 2 ) 

FB6 

( IE-5, lE-6 ) 

iiC'R  1(2,4) 

4 7 

23 

-346.091 

70 

( 2 I 

FB6 

( lE-5, lE-6 ) 

BUCKLEY 

1 15 

24 

-12.4021379 

■ 

( 5 ),(  7 ) 

FB6 

( IE- 3,  lE-4  ) 

BCR  1(1,1) 

34 

lO" 

6E-5 

44 

( 3 ) 

FB6 

( IE- 3, lE-4  ) 

Bt.'R  1 ( 1,4) 

31 

10 

3E-5 

4 1 

( 3 ) 

FB6 

( IE- 3,  lE-4  ) 

BCRl ( 2,1) 

34 

14 

1 . 2F:-4 

4 8 

( 3 ) 

FB6 

( lE-3, IE-4  ) 

BCR  1(2,4) 

27 

14 

-1 . 3F-5 

•4  1 

( 3 ) 

FB6 

( 1E-3, lE-4  ) 

BUCKI.KY 

120 

1 3 

2.066 

■ 

( 4 ) , ( 5 ) 

FRFCP 

( 0,0  ) 

BCR  1(1,1) 

60 

21 

16.5045855 

81 

( 1 I 

FFFCP 

( 0,0  ) 

BCR  1 ( 1,4) 

35 

1 2 

16.5045855 

4 7 

( 1 1 

FRFf’P 

( 0,0  ) 

BCR  1(2,1) 

27 

14 

1 6 . 5(145862 

4 1 

( 1 1 

FF'lFCP 

( 0,0  ) 

B('R  1(2,4) 

38 

20 

16.5045855 

58 

( 1 1 

FHFCI' 

( 0,0  ) 

BUCK I KY 

47 

30 

16.5045855 

77 

( 1 1 

FKFCP 

( 0,0  ) 

CFLETCUER 

19 

19 

16 . 5045’’7t. 

38 

( 1 ) 

FPFCP 

( lE-5, IE-6 ) 

BCR  1(1,1) 

56 

18 

16.5045908 

74 

( 2 1 

rHFi'P 

1 lE-5, 1E-6 ) 

B(  •)<  1 ( 1,4) 

43 

1 3 

16.5045855 

56 

( 2 ) 

FRFCP 

( IF- 5, IK-6  I 

BCR  1(2,1) 

27 

14 

1 6 . 50458i  ■ 

4 1 

1 2 ) 

FRKCi' 

( IK-5, lE-6 ) 

BCR  1 ( ? , 4 ) 

38 

20 

16. 5046O8 

58 

( . I 

FRFC'I’ 

( I F-5  , 1 F.-6  1 

BUCKLEY 

120 

3n 

16.50458C6 

150 

(41,(21 

FRF-;CP 

( lF-2, lK-3 ) 

BCR  1 ( 1,1  ) 

31 

10 

16.6575 

4 1 

( 2 ) 

FRFCI' 

( IK-P,  IF- n 

HCRK  1,4) 

31 

16.51 107 

4 0 

( .3  ) 

FRFC1> 

( 1F-2,  if;-  3 1 

13CR  1 ( .' , 1 ) 

10 

1(>.7  36 

34 

( 2 ) 

FRF('F> 

( IK-?, 1K-  3 ) 

BCRK  ?,4  ) 

3 1 

1 1 

16.556 

m2 

( 2 ) 

FRFCP 

( IF  - ?,  IK-3  ) 

BUCK  I F Y 

1 

12 

17.15 

- 

(51,(4) 

DO 


KHOI 

(0,0) 

BCR  I ( I , I ) 

112 

34 

8E-14 

146 

( 1 ) 

‘ ' ' t ■'  ' 

BCR  1 ( 1,4) 

)04 

34 

4 . 7E-1  3 

138 

( 1 ) 

l ■ , ' ) 

BCR  H 2 , 1 ) 

bO 

44 

7E-  1 3 

1 13 

( I ) 

f-'IIUl./ 

1 U , 0 ) 

BCR  1(2,4) 

63 

39 

1 .6E-1  3 

102 

( 1 ) 

HlUl,/. 

(1^0) 

BUCKEFY 

12 

8 

3.76 

- 

( 6 ) 

Fiiui.i; 

1 U , 0 ) 

CFEEICHF.H 

181 

181 

4.171 

( 4 ) 

KHOI.Z 

( lE-5,  lK-6 ) 

BCHK  1,1) 

95 

29 

6.4E-8 

124 

( 2 ) 

Knot,/. 

( lE-5,  lK-6 ) 

BCR  1 i 1,4) 

1 30 

35 

4E-7 

165 

( 2 ) 

FHOLZ 

( IF.-S,  IF.-b  ) 

BCR  1(2,1) 

70 

39 

8E-9 

109 

( 2 ) 

FH()i,Z 

( lE-5,  lE-n ) 

BCR  1(2,4) 

34 

16 

5.6E-3 

- 

( 5 ) 

FlKiL'/. 

( IE-5, lE-6 ) 

BUCKEFY 

12 

8 

3.76 

“ 

( 6 ) 

FiiOFZ 

{ IE-2,  7E-3 ) 

BCR  1 ( 1,1) 

38 

1 1 

4 . 94 

- 

I 5 ) 

MlOt.Z 

( IE-2,  lE-3  ) 

B(-'R  1 ( 1,4) 

43 

1 1 

5.53 

- 

( 5 ) 

FilOLZ 

( lE-2, lE-3  ) 

BCR  1(2,1) 

44 

20 

5E-  3 

64 

( 2 ) 

enoLz 

( IE-2,  lE-3  ) 

BCR  1(2,4) 

25 

9 

5.6 

- 

( 5 ) 

FHOLZ 

( IE-2, lE-3  ) 

BUCKEFY 

120 

1 3 

7E-4 

133 

( 4 ),  ( 2 ) 

FEASY 

(0,0  ) 

BCR  I ( 1,1) 

I I 

5 

-1  1 .7182934 

16 

( 1 ) 

FEASY 

( 0,0  ) 

BC'R  1(2,1  ) 

9 

6 

-1  1 .7182934 

15 

( 1 1 

FKASY 

( 0,0  ) 

FLETCHER 

14 

14 

-1  1 .7182934 

28 

( 1 ) 

FEASY 

( 0,0  ) 

POW.M(7-DRV. 

, 117 

-11. 7182934 

I I 7 

( I ) 

FEASY 

( )E-5, lE-6 ) 

BCR  1(1,1) 

16 

5 

-1  1 .7182932 

21 

( 2 ) 

FEASY 

( lE-5, lE-6 ) 

BCR  1(2,1) 

16 

12 

-1  1 .7182929 

28 

( 2 ) 

FEASY 

( IE-5, lE-6 ) 

FLETCHER 

16 

1 6 

-1 1 .7182933 

32 

( 2 ) 

FEASY 

( IE-5, IE-6 ) 

POV\I . NO- DRV 

1 14 

-1 1 .7182496 

1 14 

( 2 ) 

FEASY 

( lE-2, lE-3 ) 

BCR  1(1,1) 

24 

10 

-1 1 .7182679 

34 

( 2 ) 

FEASY 

( IE-2, lE-3 ) 

BCR  1(2,1  ) 

20 

10 

-1  1 .7  14795 

30 

( 2 ) 

FEASY 

( lE-2, IE-3 ) 

FLETCHER 

13 

13 

-1 1 .714877 

26 

( 2 ) 

FEASY 

( lE-2, lE-3 ) 

POW.NO-URV 

177 

-2.52 

•“ 

( 5 ) 

Fi'OWL 

{ 0,0  ) 

BCR  1(1,1) 

87 

29 

6.4E-14 

1 16 

( 1 ) 

FPOWE 

( 0,0  ) 

BCR  1(2,1) 

53 

45 

5E-r7 

98 

( 1 ) 

FFOWE 

( 0,0  ) 

FLETCHER 

48 

48 

3E-1  1 

96 

( 1 ) 

FFOWL 

( 0,0  ) 

POW.NO-DRV 

290 

3E-  1 1 

290 

( 1 ) 

FPOWE 

( ?E-5, IE-6  ) 

BCR  1(1,1) 

90 

25 

2E-7 

1 15 

( 2 ) 

F POWL 

( IE-5,  IE-6  ) 

BCRl ( 2,1) 

35 

24 

1 . 3E-6 

59 

( 2 ) 

Fpuva, 

( lE-5,  IE-6  ) 

FLETCHER 

38 

38 

1 . 3E-7 

76 

( 2 ) 

F ['( iWE 

( IE-5,  IE-6  ) 

POW.  N(J-DRV 

309 

7.6E-7 

309 

( 2 ) 

Fpnwi, 

( 1E-2, lE-3  ) 

BCR  1(1,1) 

44 

1 2 

IE-3 

56 

( 2 ) 

FF’OWl, 

1 IE-2,  IE- 3 1 

B(’R  1(2,1) 

35 

22 

7.6E-5 

57 

( 2 ) 

FP(  iWl, 

{ IE-2,  IE-3  ) 

FLETCHER 

19 

19 

3. 5E-3 

38 

( 2 ) 

FPOWL 

( IE-2, IE-3  ) 

t'OW.NO-DRV 

147 

2.6583 

~ 

( 5 ) 

FW(  K )I) 

( 0,0  ) 

BCR  1(1,1) 

154 

44 

3E-7 

198 

( 1 1 

FWiK'D 

( 0,0  ) 

BCR  1(2,1  ) 

109 

51 

2.8E- 10 

160 

( 1 1 

FW(KJl) 

( 0,0  ) 

FLETCHER 

22 

22 

7.87 

- 

( 3 ) 

FWIXH) 

( 0,0  ) 

POW.NO-DRV 

148 

6E-  1 0 

148 

( 1 ) 

FW(X)|) 

( IE-7, IE-8  ) 

BCRK  1,1) 

163 

48 

6. lE-9 

21  1 

( 2 ) 

( V.YHID 

( IE-7,  IK-8  I 

BCRK  2,1) 

126 

53 

3.5E-9 

1 ’9 

( 2 ) 

F Wi  X)l) 

( IE-7,  IF.-H  ) 

FT. ETCHER 

22 

22 

7.87 

- 

( 3 ) 

F W(  H )U 

( IE-7, lE-8  ) 

POW.NO-DRV 

157 

6.2E-9 

157 

( 2 ) 

51 


FROSE 

( 0,0  ) 

BCR  1(1,1) 

63 

21 

5.6E-10 

39 

( 1 ) 

FROSE 

( 0,0  ) 

BCR  1(2,1  ) 

b1 

34 

2.5E-1 1 

95 

i I ) 

FROSE 

(0,01 

FLETCHER 

44 

^ . 2E- 10 

88 

( 1 ) 

FROSE 

( 0,0  ) 

POW.NO-DRV 

]9 

4.5E-5 

“ 

( 5 ) 

FROSE 

{ lE-7, lE-8  ) 

BCR1 ( 1,1) 

68 

21 

9 . 8E- 1 1 

89 

( 2 ) 

FROSE 

( IE-7, lE-8  ) 

BCR  1(2,1) 

62 

30 

2. 7E-10 

92 

( 2 ) 

FROSE 

( IE-7, lE-8  ) 

FLETCHER 

44 

44 

1 . 8E-10 

88 

( 2 ) 

FROSE 

( IE-7,  IE-8  ) 

POW.NO-DRV 

27 

4. 5E-5 

“ 

( 5 ) 

FROSE 

( IE-2, lE-5  ) 

BCR1 { 1,1) 

100 

32 

3.5E-7 

132 

( 2 ) 

FROSE 

( lE-2, lE-5  ) 

BCRK  2,1) 

66 

34 

6. IE-7 

100 

( 2 ) 

FROSE 

( 1E-2, lE-5  ) 

FLETCHER 

45 

45 

1 . 3E-7 

90 

( 2 ) 

FROSE 

( IE-2, lE-5  ) 

POW.NO-DRV 

46 

2.6E-3 

- 

( 5 ) 

COMMENTS 


( 1 ) LOCAL  MINIMUM 
( 2 ) LOCAL  MINIMUM  TO  WITHIN  ERROR 
( 3 ) STUCK  NEAR  SADDLE  POINT 

( 4 ) EXCEEDED  MAXIMUM  NUMBER  OF  FUNCTIONS  ALLOWED 

( 5 ) NOT  A LOCAL  MINIMUM 

( 6 ) ABORTED  BY  GOING  OUT  OF  BOUNDS 

( 7 ) ABORTED  BECAUSE  OF  TOO  MANY  DIVIDE  CHECKS 

(S)  WENT  TO  MINIMUM  WHEN  STARTED  AT  DIFFERENT  POINT 

(9)  WENT  TO  MINIMUM  WHEN  GIVEN  DIFFERENT  SCALE 


J 


loj  (N  oj  m 04  ovj  04  un 


-tmtrtno  :ytr-t  to 

security  classification  of  This  page  i'H/hti  Dhik  Fmorrd) 


!/■ 


1.  REPOR 


REPORT  DOCUMENTATION  PAGE 


:ad  instkuctions 

....  ’E  COMIM.ETI.NT.  KOKM 


AFQSRj-TR-  7 7 ^ 9^  4| 


2.  GOVT  ^VCCESSION  N< 


4.  TITLE  (and  SubfiUo) 

AN  ALGORITHM  FOR  MINIMIZING  A DIFFERENTIABLE*^  ' 
FUNCTION  SU13JECT‘'T0  BOX  CONSTRAINTS  AND  ERRORS* 
^ I 

-Sf'  TYr»e-«rF-TrEPORT  & PCrioS  CbdCREb 

^ / 

/ fnt-prim  f jL  ' • , ■ 

e;  rienroijMiNG  oab.  repo/t  number  \ 

■'  -r  " \ 

7.  author(s; 

8^  CONTRACT  OR  grant  NUMBERf,; 

R.  K.y^rayton  and  Jany'Cullum  | /4 

F44620-76-C-0Ef22/-' 

/ 

9 performing  organi z ation  name  and  address 

IBM  Thomas  J.  Watson  Research  Center 

Yorktown  Heights,  New  York  / / 

jIIc  y/j 

10.  program  ELEMENT.  PROJECT  TASK 
AREA  A WORK  gNiT  NUMBERS 

61102F  2304/A4 

" ArrTo°rc^e"t)?fice  '^o^^'^cTen'l^m  Research/NM 
Bolling  AFB,  Washington,  DC  20332..^  r^// 

12.  REPORT  DATE  / f7^ 

:^/in/77  — 

13.  NUMBER  OF  PAGES  ^ t A jl  1 
/[§ — / ^ • 

14  MONITORING  AGENCY  NAME  a ADDRESS(ir  ComTofl^7i?t7r/iceJ 

IS.  SECURITY  CLASS,  (of  Fhes  report; 

UNCLASSIFIED 

15a.  DECLASSIFICATION  DOWNGRADING 
SCHEDULE 

16.  distribution  STATEMENT  (o/  Ki-porO  , i*  • . 

Approved  for  public  release;  distribution  unlimited. 

-•  f^c.ClPlENT•S  CAT  ALOG  NUMBER 


If, 


t7.  DISTRIBUTION  STATEMENT  (oi  the  abstract  entered  in  tHock  20,  if  different  from  Report) 


18.  supplementary  NOTES 


t9-  KEY  WORDS  (Continue  on  reverse  side  if  necessary  and  identify  hy  block  number) 


constrained  opLimization,  optimization  with  constraints,  box  constraints, 
bound  variables,  variable  metric,  quasi -Newton , optimization  with  errors 


20.  ABSTRACT  (Continue  on  reverse  side  ff  necessary  and  identifv  by  block  number) 

w Wc  consider  the  problem  of  minimi/ine  a dirfcrcntiablc  function  of  n p.irainetcrs  v.  ith  the 
parameters  restricted  to  a box  in  n-spacc.-\Vc  describe  a variable  metric  algorithm  designed 
for  this  problem.  This  algorithm  differs  considerably  from  others  proposed.  It  uses  the 
symmetric  tank  one  update  formula,  so  that  no  matrix  projections  at  the  boundary  of  tlic  box 
arc  necessary;  and  conseciuenily,  a Hessian  approximation  on  the  subspnee  spanned  by  the 
directions  previously  searched  is  available  at  each  iteration  to  use  in  determining  ubieh 
constraints  to  drop.  .Moreoser,  the  ab'.orithm  is  de.sinned  to  handle  p'obleins  vsfieie  the 
L function  and  its  pradicnl  are  evaluated  s\ith  error.  The  difficulties  associated  unit  the 
1473  COITION  or  1 NOV  6S  IS  OOSOLETE 

^ UKLASSLFIE.D 

r SECUKi  I V CL  AssiriL  AT  ION  or  this  PAGE  r'»i«  r 


nn 

< JAN  73 


// 


'//  ' 


■ f 

nf 


A 


SECURITY  CLASSIFICATION  OF  THIS  PAGEfWTien  Data  Enlarad; 


20  Abstract 


cuic  ,n.,k  one  u,,.b.c  Icm.la  «s.d  ,o 

ton  enco,,n,n,nd  in  nuninneal  oxpnncncn  .«  f J'  art  ah.nv  (1)  TlK 

dwcribcil  in  Cullum  ami  Drayton  ' ' „„consIraincil  minmn/aiion 
nipotitl,,..  r.-.;-  ,„„bion„.  ,l,c  a.y«...i.n.  ;x  toitabio 

considered. 


UNCLASSIFirn 


security  cl  ASSIf  IC  ATiON  OE  e. 


PAGI  f 


